Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51367

[BP-MB-51336] : Eventing rebalance failed due to timeout

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 6.6.0
    • 6.6.6
    • eventing
    • Enterprise Edition 7.1.0 build 2440

    Description

      TEST

      -test tests/eventing/neo/test_eventing_rebalance_rbac.yml -scope tests/eventing/neo/scope_eventing_rebalance.yml
      

      Day - 1
      Cycle - 4
      Scale - 3

      TEST STEP
      Rebalance out eventing node.

      [2022-03-07T16:00:24-08:00, sequoiatools/couchbase-cli:7.1:eb4e62] rebalance -c 172.23.104.16:8091 --server-remove 172.23.104.23 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:7.1:[rebalance -c 172.23.104.16:8091 --server-remove 172.23.104.23 -u Administrator -p password]
       
      docker logs eb4e62
      docker start eb4e62
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      

      FAILURE
      Rebalance exited due to timeout.

      2022-03-07T16:23:23.141-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.16) - Rebalance exited with reason {service_rebalance_failed,eventing,
                                    {worker_died,
                                     {'EXIT',<0.2016.378>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}.
      Rebalance Operation Id = d0468c56a828245da829b89b1c4946eb
      

      OBSERVATION
      From ns_server.eventing.log on 172.23.104.21 isRebalanceOngoing is false for approx. 20 minutes prior to rebalance failure.

      2022-03-07T16:02:23.010-08:00 [Info] Consumer::RebalanceTaskProgress [worker_n1ql2_0_0:/tmp/127.0.0.1:8091_0_2471638815.sock:113831] isBootstrapping: false isRebalanceOngoing: false vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: []
      2022-03-07T16:02:23.010-08:00 [Info] Consumer::RebalanceTaskProgress [worker_n1ql2_0_0:/tmp/127.0.0.1:8091_0_2471638815.sock:113831] uuid: 778736fd979cb089694b9266cc4028f0 eject node UUIDs: [5a957ac3d16b08bab4324fce5a848948]
      2022-03-07T16:02:23.010-08:00 [Info] ServiceMgr::getRebalanceProgress Function: n1ql2_0 rebalance progress from node with rest port: 8091 progress: &{0 0 0 0 <nil>} err: <nil>
      

      List of eventing nodes

      • 172.23.104.21
      • 172.23.104.23
      • 172.23.96.31

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            This is a side effect of the fix done for MB-51078. A race condition that got exposed as we now handle stream open / close much earlier in the pipeline. Hence must be included in 6.6.6.

            jeelan.poola Jeelan Poola added a comment - This is a side effect of the fix done for MB-51078 . A race condition that got exposed as we now handle stream open / close much earlier in the pipeline. Hence must be included in 6.6.6.

            Build couchbase-server-6.6.6-10557 contains eventing commit 8ff750d with commit message:
            MB-51367 : vbsRemainingtoOwn should account for enqueued streamends

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.6-10557 contains eventing commit 8ff750d with commit message: MB-51367 : vbsRemainingtoOwn should account for enqueued streamends

            People

              sujay.gad Sujay Gad
              abhishek.jindal Abhishek Jindal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty