Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51640

[FP of MB-51607] [System Test] Eventing rebalance stuck at 99%

    XMLWordPrintable

Details

    Description

      QE TEST

      -test tests/eventing/neo/test_eventing_rebalance_rbac.yml -scope tests/eventing/neo/scope_eventing_rebalance.yml
      

      Day - 2
      Cycle - 6
      Scale - 3

      TEST STEP
      Rebalance in single data and pair of eventing nodes.

      [2022-03-26T21:02:29-07:00, sequoiatools/couchbase-cli:7.1:165c98] server-add -c 172.23.104.16:8091 --server-add https://172.23.104.17 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
      [2022-03-26T21:02:51-07:00, sequoiatools/couchbase-cli:7.1:8f2bc9] server-add -c 172.23.104.16:8091 --server-add https://172.23.121.165 -u Administrator -p password --server-add-username Administrator --server-add-password password --services eventing
      [2022-03-26T21:03:11-07:00, sequoiatools/couchbase-cli:7.1:2f81cd] server-add -c 172.23.104.16:8091 --server-add https://172.23.96.30 -u Administrator -p password --server-add-username Administrator --server-add-password password --services eventing
      [2022-03-26T21:03:21-07:00, sequoiatools/couchbase-cli:7.1:da760d] rebalance -c 172.23.104.16:8091 -u Administrator -p password
      →  Error while waiting for container:%!(EXTRA *docker.NoSuchContainer=No such container: 0e089b4addca3573e35771cce2494321a2ad0f33ce481277ba4bb24c02b7ffcf)
       
      Error occurred on container - sequoiatools/couchbase-cli:7.1:[rebalance -c 172.23.104.16:8091 -u Administrator -p password]
       
      docker logs da760d
      docker start da760d
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      

      REBALANCE FAILURE

      2022-03-26T22:04:59.392-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.16) - Rebalance exited with reason {service_rebalance_failed,eventing,
                                    {worker_died,
                                     {'EXIT',<0.13370.518>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}.
      Rebalance Operation Id = 1b868fc3f80f1551e28440237600af2d
      

      OBSERVATION
      Vb shuffle is stuck for vbucket 9 for source_op bucket.

      2022-03-26T21:46:38.082-07:00 [Info] Consumer::RebalanceTaskProgress [worker_sbm2_0_0:/tmp/127.0.0.1:8091_0_744565610.sock:5228] isBootstrapping: false isRebalanceOngoing: true vbsRemainingToCloseStream len: 0 dump: [] vbsRemainingToStreamReq len: 0 dump: [] vbsRemainingToOwn len: 1 dump: [9 vbsRemainingToGiveUp len: 0 dump: []
      2022-03-26T21:46:38.083-07:00 [Info] Consumer::RebalanceTaskProgress [worker_sbm2_0_0:/tmp/127.0.0.1:8091_0_744565610.sock:5228] uuid: e3d1078cc5449e5c0b763676822ad4f8 eject node UUIDs: []
      2022-03-26T21:46:38.083-07:00 [Info] ServiceMgr::getRebalanceProgress Function: sbm2_0 rebalance progress from node with rest port: 8091 progress: &{0 0 1 205 <nil>} err: <nil>
      2022-03-26T21:46:38.083-07:00 [Info] util::GetProgress endpointURL: http://172.23.104.21:8096/getRebalanceProgress VbsRemainingToShuffle: 1 VbsOwnedPerPlan: 205
      2022-03-26T21:46:38.183-07:00 [Info] util::GetProgress endpointURL: http://172.23.104.23:8096/getRebalanceProgress VbsRemainingToShuffle: 0 VbsOwnedPerPlan: 0
      2022-03-26T21:46:38.248-07:00 [Info] util::GetProgress endpointURL: http://172.23.121.165:8096/getRebalanceProgress VbsRemainingToShuffle: 0 VbsOwnedPerPlan: 0
      2022-03-26T21:46:38.302-07:00 [Info] util::GetProgress endpointURL: http://172.23.96.30:8096/getRebalanceProgress VbsRemainingToShuffle: 0 VbsOwnedPerPlan: 0
      2022-03-26T21:46:38.366-07:00 [Info] util::GetProgress endpointURL: http://172.23.96.31:8096/getRebalanceProgress VbsRemainingToShuffle: 0 VbsOwnedPerPlan: 0
      2022-03-26T21:46:38.367-07:00 [Info] util::GetProgress endpointURL: http://127.0.0.1:8096/getAggRebalanceProgress VbsRemainingToShuffle: 1 VbsOwnedPerPlan: 205
      2022-03-26T21:46:38.371-07:00 [Info] rebalancer::gatherProgress total vbs to shuffle: 18411 remaining to shuffle: 1 progress: 99.99456846450492 counter: 33 cmp: true
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              sujay.gad Sujay Gad
              ankit.prabhu Ankit Prabhu
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty