Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-52718

[System Test] Eventing rebalance failed due to 20 min timeout

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 7.1.2
    • Morpheus
    • eventing
    • Enterprise Edition 7.1.2 build 3321

    Description

      QE TEST

      -test tests/eventing/neo/test_eventing_rebalance.yml -scope tests/eventing/neo/scope_eventing_rebalance.yml
      

      Day - 1
      Cycle - 2
      Scale - 3

      TEST STEP
      Rebalance in single eventing node.

      [2022-06-24T01:44:43-07:00, sequoiatools/couchbase-cli:7.1:79eb80] server-add -c 172.23.104.16:8091 --server-add https://172.23.97.77 -u Administrator -p password --server-add-username Administrator --server-add-password password --services eventing
      [2022-06-24T01:44:53-07:00, sequoiatools/couchbase-cli:7.1:478f5c] rebalance -c 172.23.104.16:8091 -u Administrator -p password
      

      REBALANCE FAILURE

      2022-06-24T02:10:01.562-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.16) - Rebalance exited with reason {service_rebalance_failed,eventing,
                                    {worker_died,
                                     {'EXIT',<0.11561.107>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}.
      

      OBSERVATION
      Vbshuffle did not complete on node 172.23.104.23.

      2022-06-24T01:59:58.474-07:00 [Info] util::GetProgress endpointURL: http://172.23.104.21:8096/getRebalanceProgress VbsRemainingToShuffle: 0 VbsOwnedPerPlan: 0
      2022-06-24T01:59:58.517-07:00 [Info] util::GetProgress endpointURL: http://172.23.104.23:8096/getRebalanceProgress VbsRemainingToShuffle: 1 VbsOwnedPerPlan: 256
      2022-06-24T01:59:58.542-07:00 [Info] util::GetProgress endpointURL: http://172.23.96.31:8096/getRebalanceProgress VbsRemainingToShuffle: 0 VbsOwnedPerPlan: 0
      2022-06-24T01:59:58.565-07:00 [Info] util::GetProgress endpointURL: http://172.23.97.77:8096/getRebalanceProgress VbsRemainingToShuffle: 0 VbsOwnedPerPlan: 0
      2022-06-24T01:59:58.566-07:00 [Info] util::GetProgress endpointURL: http://127.0.0.1:8096/getAggRebalanceProgress VbsRemainingToShuffle: 1 VbsOwnedPerPlan: 256
      

      Issue is with vb 264.

      2022-06-24T01:45:11.475-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: stopped, starting dcp stream
      2022-06-24T01:45:11.475-07:00 [Info] Consumer::updateVbOwnerAndStartDCPStream [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 Sending streamRequestInfo size: 2
      2022-06-24T01:45:11.541-07:00 [Info] Consumer::processReqStreamMessages [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 reqStreamCh size: 99 msg: &consumer.streamRequestInfo{manifestUID:"", startSeqNo:0x0, vb:0x108, vbBlob:(*consumer.vbucketKVBlob)(0xc00cc1c680)} Got request to stream
      2022-06-24T01:45:11.554-07:00 [Info] Consumer::dcpRequestStreamHandle [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 kvAddr: 172.23.104.18:11210 Started up new dcp feed. Spawned aggChan routine
      2022-06-24T01:45:12.667-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T01:45:12.667-07:00 [Info] Consumer::updateVbOwnerAndStartDCPStream [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 Sending streamRequestInfo size: 0
      

      Vbblob stream status is empty for this particular vbucket.

      grep "vbblob stream status: ," ns_server.eventing.log
      2022-06-24T01:45:12.667-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T01:45:34.835-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T01:47:10.057-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T01:48:38.254-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T01:50:39.534-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T01:52:18.758-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T01:53:40.952-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T01:55:29.202-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T01:57:07.426-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T01:58:42.642-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T02:00:16.872-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T02:02:31.184-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T02:03:41.344-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T02:04:26.450-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T02:07:35.873-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T02:09:11.076-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      2022-06-24T02:09:43.149-07:00 [Info] Consumer::doVbTakeover [worker_n1ql_0_0:/tmp/127.0.0.1:8091_0_66756173.sock:53586] vb: 264 vbblob stream status: , starting dcp stream
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-7.2.0-1488 contains eventing commit 6fdb13e with commit message:
            MB-52718 : Default manifest Id to 0 if not found in checkpoint blob or in-memory

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1488 contains eventing commit 6fdb13e with commit message: MB-52718 : Default manifest Id to 0 if not found in checkpoint blob or in-memory

            Build couchbase-server-8.0.0-1039 contains eventing commit 6fdb13e with commit message:
            MB-52718 : Default manifest Id to 0 if not found in checkpoint blob or in-memory

            build-team Couchbase Build Team added a comment - Build couchbase-server-8.0.0-1039 contains eventing commit 6fdb13e with commit message: MB-52718 : Default manifest Id to 0 if not found in checkpoint blob or in-memory

            People

              sujay.gad Sujay Gad
              sujay.gad Sujay Gad
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty