Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47827

[BP 7.0.2 MB-47775] - [System Test] multiple subsequent rebalance failures due to error "Protocol Conflict Error: Existing Rebalance Token Found"

    XMLWordPrintable

Details

    Description

      Build : 7.0.1-5977
      Test : -test tests/2i/cheshirecat/test_idx_clusterops_cheshire_cat_recovery.yml -scope tests/2i/cheshirecat/scope_idx_cheshire_cat_dgm.yml
      Scale : 2
      Iteration : 8th (day 3)

      There were 12 rebalance operations (including rebalance retries) that failed between 2021-08-05T07:15:28 & 2021-08-05T08:09:44 due to the following error :

      [ns_server:error,2021-08-05T07:15:28.689-07:00,ns_1@172.23.97.215:service_rebalancer-index<0.13433.1863>:service_rebalancer:run_rebalance_worker:119]Worker terminated abnormally: {'EXIT',<0.22636.1863>,
                                     {{badmatch,
                                       {error,
                                        {unknown_error,
                                         <<"Protocol Conflict Error: Existing Rebalance Token Found">>}}},
                                      [{service_rebalancer,rebalance_worker,1,
                                        [{file,"src/service_rebalancer.erl"},
                                         {line,164}]},
                                       {proc_lib,init_p,3,
                                        [{file,"proc_lib.erl"},{line,234}]}]}}
      [user:error,2021-08-05T07:15:28.692-07:00,ns_1@172.23.97.215:<0.9048.0>:ns_orchestrator:log_rebalance_completion:1416]Rebalance exited with reason {service_rebalance_failed,index,
                                    {worker_died,
                                     {'EXIT',<0.22636.1863>,
                                      {{badmatch,
                                        {error,
                                         {unknown_error,
                                          <<"Protocol Conflict Error: Existing Rebalance Token Found">>}}},
                                       [{service_rebalancer,rebalance_worker,1,
                                         [{file,"src/service_rebalancer.erl"},
                                          {line,164}]},
                                        {proc_lib,init_p,3,
                                         [{file,"proc_lib.erl"},{line,234}]}]}}}}.
      Rebalance Operation Id = 4ffb583616e3db29eaada8060814357e
      

      The indexer nodes in the cluster are :
      172.23.107.2, 172.23.107.3, 172.23.107.4, 172.23.107.5, 172.23.97.216, 172.23.97.217

      On 172.23.97.217, following can be seen in the indexer logs around the time of the above rebalance failure :

      2021-08-05T07:15:28.619-07:00 [Info] ServiceMgr::StartTopologyChange {a710619e06a78fafe47a00bf5001c163 [] topology-change-rebalance [{{74489e779980eda2f0e670ca180abc6d 5 <nil>} recovery-full} {{5fa598444337c8d73f779b6e8bef8a84 5 <nil>} recovery-full} {{f7e7ed8fefd9cd788d594b6dcc4ad22c 5 <nil>} recovery-full} {{732bbd597e5e5f841da2d912f49a0961 5 <nil>} recovery-full} {{db5cedf056b862d55cd091c0d82299d5 5 <nil>} recovery-full} {{4897f2a4f003b2716d736860c60f007b 5 <nil>} recovery-full}] []}
      2021-08-05T07:15:28.635-07:00 [Info] ServiceMgr::cleanupOrphanTokens Found Rebalance Token &{74489e779980eda2f0e670ca180abc6d a5:ae:d8:21:c3:7a:a2:11 MoveIndex move index failure - index build is in progress for indexes: [bucket5:idx2_ASLCO4Z36M_idxprefix]. }
      2021-08-05T07:15:28.637-07:00 [Info] updator: updating service map.  server group=Group 1, indexerVersion=5 nodeAddr 172.23.97.217:8091 clusterVersion 5 excludeNode  storageMode 2
      2021-08-05T07:15:28.663-07:00 [Error] ServiceMgr::startRebalance Found Existing Global RToken &{74489e779980eda2f0e670ca180abc6d a5:ae:d8:21:c3:7a:a2:11 MoveIndex move index failure - index build is in progress for indexes: [bucket5:idx2_ASLCO4Z36M_idxprefix]. }
      2021-08-05T07:15:28.663-07:00 [Info] ServiceMgr::runCleanupPhase path /indexing/rebalance/RebalanceToken isMaster true
      2021-08-05T07:15:28.684-07:00 [Info] ServiceMgr::cleanupLocalRToken Cleanup
      2021-08-05T07:15:28.684-07:00 [Info] ClustMgr:handleDelLocalValue Key RebalanceToken
      2021-08-05T07:15:28.685-07:00 [Info] ServiceMgr::cleanupRebalanceRunning Cleanup
      2021-08-05T07:15:28.685-07:00 [Info] ClustMgr:handleDelLocalValue Key RebalanceRunning
      2021-08-05T07:15:28.686-07:00 [Info] ServiceMgr::StartTopologyChange returns Error Protocol Conflict Error: Existing Rebalance Token Found. isBalanced false.
      

      This issue is similar to MB-46489 which was fixed in 7.0.0

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              jeelan.poola Jeelan Poola
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty