Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59563

[System Test Upgrade] :- Online upgrade using swap rebalance for 2i fails with "service_rebalance_failed,index, {worker_died, {'EXIT',<0.18034.2>, {{badmatch, {error, {unknown_error, <<"Post \"http://172.23.104.176:9102/registerRebalanceToken\": EOF">>"

    XMLWordPrintable

Details

    • Untriaged
    • Linux x86_64
    • 0
    • No

    Description

      Steps to Repro
      1. Run the below longevity test on 7.2.3 for 4-5 days.

      ./sequoia -client 172.23.104.254:2375 -provider file:centos_third_cluster.yml -test tests/integration/7.2/test_7.2.yml -scope tests/integration/7.2/scope_7.2_magma.yml -scale 3 -repeat 0 -log_level 0 -version 7.2.3-6705 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
      

      2. Upgraded all KV node using graceful failover/recovery strategy.
      3. Created 5 nodes on 7.6 with provisioned profile(172.23.108.144 ,172.23.97.179, 172.23.104.176 ,172.23.97.183 ,172.23.121.118) add them to the cluster and remove 4 indexing nodes which is part of cluster in 7.2.3 and do rebalance. This is done to simulate upgrade that we would have on cloud so that file based rebalance get used even during upgrade.

      172.23.108.144 7:32:56 AM 9 Nov, 2023

      Starting rebalance, KeepNodes = ['ns_1@172.23.104.176','ns_1@172.23.104.216',
      'ns_1@172.23.104.249','ns_1@172.23.105.134',
      'ns_1@172.23.105.210','ns_1@172.23.105.38',
      'ns_1@172.23.105.39','ns_1@172.23.105.91',
      'ns_1@172.23.106.37','ns_1@172.23.107.142',
      'ns_1@172.23.107.236','ns_1@172.23.107.25',
      'ns_1@172.23.108.129','ns_1@172.23.108.134',
      'ns_1@172.23.108.136','ns_1@172.23.108.138',
      'ns_1@172.23.108.139','ns_1@172.23.108.140',
      'ns_1@172.23.108.141','ns_1@172.23.108.143',
      'ns_1@172.23.108.144','ns_1@172.23.108.145',
      'ns_1@172.23.108.146','ns_1@172.23.108.148',
      'ns_1@172.23.121.118','ns_1@172.23.97.179',
      'ns_1@172.23.97.183'], EjectNodes = ['ns_1@172.23.108.61',
      'ns_1@172.23.108.34',
      'ns_1@172.23.108.132',
      'ns_1@172.23.106.54'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 9c7f04f312337cd95f93ff81cd2f539b
      

      172.23.108.144 7:33:11 AM 9 Nov, 2023

      Rebalance exited with reason {service_rebalance_failed,index,
      {worker_died,
      {'EXIT',<0.18034.2>,
      {{badmatch,
      {error,
      {unknown_error,
      <<"Post \"http://172.23.104.176:9102/registerRebalanceToken\": EOF">>}}},
      [{service_manager,rebalance_op,5,
      [{file,"src/service_manager.erl"},
      {line,341}]},
      {service_manager,do_run_op,1,
      [{file,"src/service_manager.erl"},
      {line,257}]},
      {proc_lib,init_p,3,
      [{file,"proc_lib.erl"},{line,225}]}]}}}}.
      Rebalance Operation Id = 9c7f04f312337cd95f93ff81cd2f539b
      

      Rebalance continues to file on repeated retires. I am going to try a few more times. We would have to mark this a blocker if this doesn't progress. cbcollect_info attached.

      Any workarounds are highly appreciated.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty