Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48661

Rebalance out a node failed. reason: setup_replications_failed



    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • 7.1.0
    • couchbase-bucket
    • 7.1.0-1363


      Steps To Reproduce:

      1. Create a 10 node KV cluster
      2. Create a magma bucket with 1 replica. Create 20 collections
      3. Load 10M(0-10M, 0-50k per collection) items and upsert them once
      4. Load another 1M(10M-20M, 10M-20M per collection) items and upsert them
      5. Start CRUD load per collections as below:

        Read Start: 0
        Read End: 500000
        Update Start: 1000000
        Update End: 10000000
        Expiry Start: 0
        Expiry End: 0
        Delete Start: 500000
        Delete End: 1000000
        Create Start: 1000000
        Create End: 10000000
        Final Start: 1000000
        Final End: 10000000

      6. Rebalance in one node. Abort->Resume Rebalance at 20%, 40%, 60%, 80%. Rebalance passed
      7. Crash Magma/memc with Loading of docs on all the 10 nodes every random sleep of random.randint(60, 120). After every kill, wait for bucket warmup. Everything went fine at this step. No crashes found and no critical messages in memcached.log
      8. Rebalance out one node. Rebalance Failed:

        {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@', u'tstamp': 1632909123361L, u'shortText': u'message', u'serverTime': u'2021-09-29T02:52:03.361Z', u'text': u'Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {\'EXIT\',<0.5065.41>,\n                                {{{badmatch,\n                                   {error,\n                                    {setup_replications_failed,\n                                     [{\'ns_1@\',\n                                       {errors,[{10,64}]}}]}}},\n                                  [{janitor_agent,handle_apply_vbucket_state,\n                                    2,\n                                    [{file,"src/janitor_agent.erl"},\n                                     {line,1074}]},\n                                   {janitor_agent,\n                                    apply_vbucket_states_worker_loop,0,\n                                    [{file,"src/janitor_agent.erl"},\n                                     {line,1063}]},\n                                   {proc_lib,init_p,3,\n                                    [{file,"proc_lib.erl"},{line,234}]}]},\n                                 {gen_server,call,\n                                  [{\'janitor_agent-GleamBookUsers0\',\n                                    \'ns_1@\'},\n                                   {if_rebalance,<0.3860.41>,\n                                    {wait_dcp_data_move,\n                                     [\'ns_1@\',\n                                      \'ns_1@\'],\n                                     698}},\n                                   infinity]}}}}}.\nRebalance Operation Id = 694a80c21b7d0a2eb1c7118d1781ff67'}
        2021-09-29 02:52:12,555 | test  | ERROR   | pool-3-thread-4 | [rest_client:print_UI_logs:2786] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@', u'tstamp': 1632909123311L, u'shortText': u'message', u'serverTime': u'2021-09-29T02:52:03.311Z', u'text': u'Worker <0.5714.41> (for action {move,{698,\n                                      [\'ns_1@\',\n                                       \'ns_1@\'],\n                                      [\'ns_1@\',\n                                       \'ns_1@\'],\n                                      []}}) exited with reason {unexpected_exit,\n                                                                {\'EXIT\',\n                                                                 <0.5065.41>,\n                                                                 {{{badmatch,\n                                                                    {error,\n                                                                     {setup_replications_failed,\n                                                                      [{\'ns_1@\',\n                                                                        {errors,\n                                                                         [{10,\n                                                                           64}]}}]}}},\n                                                                   [{janitor_agent,\n                                                                     handle_apply_vbucket_state,\n                                                                     2,\n                                                                     [{file,\n                                                                       "src/janitor_agent.erl"},\n                                                                      {line,\n                                                                       1074}]},\n                                                                    {janitor_agent,\n                                                                     apply_vbucket_states_worker_loop,\n                                                                     0,\n                                                                     [{file,\n                                                                       "src/janitor_agent.erl"},\n                                                                      {line,\n                                                                       1063}]},\n                                                                    {proc_lib,\n                                                                     init_p,3,\n                                                                     [{file,\n                                                                       "proc_lib.erl"},\n                                                                      {line,\n                                                                       234}]}]},\n                                                                  {gen_server,\n                                                                   call,\n                                                                   [{\'janitor_agent-GleamBookUsers0\',\n                                                                     \'ns_1@\'},\n                                                                    {if_rebalance,\n                                                                     <0.3860.41>,\n                                                                     {wait_dcp_data_move,\n                                                                      [\'ns_1@\',\n                                                                       \'ns_1@\'],\n                                                                      698}},\n                                                                    infinity]}}}}'}

      Expected Result:
      Rebalance should progress and should not fail.

      QE Test

      git fetch "http://review.couchbase.org/TAF" refs/changes/97/162297/1 && git checkout FETCH_HEAD
      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/magma_temp_job4.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False,iterations=2,sdk_timeout=60,log_level=debug,infra_log_level=debug,skip_cleanup=True -t aGoodDoctor.Hospital.Murphy.SystemTestMagma,nodes_init=10,graceful=True,skip_cleanup=True,num_items=500000,num_buckets=1,bucket_names=GleamBook,doc_size=2048,key_size=18,assert_crashes_on_load=True,num_collections=20,maxttl=10,num_indexes=20,pc=10,index_nodes=0,query_nodes=0,cbas_nodes=0,fts_nodes=0,ops_rate=50000,doc_ops=create:update:delete:read,durability=Majority,crashes=10,max_commit_points=0 -m rest'

      Daniel Owen, the plan wasn't to run this test at this stage but i end up running this as i had to verify another magma bug but then i encountered this one.

      Test Category: Unbounded Volume test that includes rebalance aborts and crashes: https://docs.google.com/spreadsheets/d/1AKutwtUlGX4UckfGPkJSKZu_7wfz_EwMMuoajCYUub8/edit#gid=1608573032&range=G7


        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.



              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              0 Vote for this issue
              4 Start watching this issue



                Gerrit Reviews

                  There are no open Gerrit changes
