Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44665

[Durabiilty+Ephemeral] Replica update rebalance failed with reason 'bulk_set_vbucket_state_failed'

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes
    • KV-Engine 2021-March

    Description

      Build: 7.0.0 4554

      Test case:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.97440.ini bucket_type=ephemeral,num_items=50000,durability=MAJORITY,sdk_timeout=60,num_items=500000,GROUP=P0,get-cbcollect-info=True,upgrade_version=7.0.0-4554 -t bucket_param_tests.bucket_param_update.BucketParamTest.test_replica_update,bucket_type=ephemeral,upgrade_version=7.0.0-4554,GROUP=P0,doc_ops=update;create;delete,sdk_timeout=60,get-cbcollect-info=False,replicas=0,durability=MAJORITY,log_level=error,nodes_init=5,num_items=500000,infra_log_level=critical'

      Scenario: (all doc_ops with durability=MAJORITY)

      • 5 node cluster, 1 ephemeral bucket with replica=0
      • Load initial load of 500K into the bucket
      • Do incremental replica_update from 0 -> 1 -> 2 -> 3 with doc_ops in parallel

      Observation:

      Rebalance failed when updating replica from 2 -> 3 with following logs.

      Rebalance exited with reason
      {mover_crashed,
      {unexpected_exit, {'EXIT',<0.18221.4>,
      {{bulk_set_vbucket_state_failed,
      [{'ns_1@172.23.100.21', {'EXIT', {{{{badmatch,{error,timeout}},
      [{mc_client_binary,cmd_vocal_recv,5, [{file,"src/mc_client_binary.erl"}, {line,158}]},
      {mc_client_binary,select_bucket,2, [{file,"src/mc_client_binary.erl"}, {line,350}]},
      {ns_memcached_sockets_pool, take_socket,2, [{file, "src/ns_memcached_sockets_pool.erl"}, {line,46}]},
      {ns_memcached_sockets_pool, '-executing_on_socket/3-fun-0-',3, [{file, "src/ns_memcached_sockets_pool.erl"}, {line,88}]},
      {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"}, {line,197}]}]},
      {gen_server,call,
      ['ns_memcached-default',
      {get_dcp_docs_estimate,540, "replication:ns_1@172.23.100.21->ns_1@172.23.100.10:default"}, 180000]}},
      {gen_server,call,
      [{'janitor_agent-default', 'ns_1@172.23.100.21'},
      {if_rebalance,<0.32015.3>,
      {update_vbucket_state,974,replica, undefined,'ns_1@172.23.100.22'}}, infinity]}}}}]},
      [{janitor_agent,bulk_set_vbucket_state,4, [{file,"src/janitor_agent.erl"}, {line,403}]},
      {ns_single_vbucket_mover, update_replication_post_move,5, [{file,"src/ns_single_vbucket_mover.erl"}, {line,530}]},
      {ns_single_vbucket_mover,on_move_done_body, 6,
      [{file,"src/ns_single_vbucket_mover.erl"}, {line,556}]},
      {proc_lib,init_p,3, [{file,"proc_lib.erl"},{line,234}]}]}}}}.
       
      Rebalance Operation Id = 7c081452ddf5be8bff6e738fb44fe809
       
      Worker <0.17405.4> (for action {move,{974,
      ['ns_1@172.23.100.21', 'ns_1@172.23.100.20', 'ns_1@172.23.100.22'], ['ns_1@172.23.100.22', 'ns_1@172.23.100.20', 'ns_1@172.23.100.21', 'ns_1@172.23.100.10'],
      []}}) exited with reason {unexpected_exit,
      {'EXIT', <0.18221.4>,
      {{bulk_set_vbucket_state_failed,
      [{'ns_1@172.23.100.21', {'EXIT', {{{{badmatch, {error, timeout}},
      [{mc_client_binary, cmd_vocal_recv, 5, [{file, "src/mc_client_binary.erl"}, {line, 158}]},
      {mc_client_binary, select_bucket, 2, [{file, "src/mc_client_binary.erl"}, {line, 350}]},
      {ns_memcached_sockets_pool, take_socket, 2, [{file, "src/ns_memcached_sockets_pool.erl"}, {line, 46}]},
      {ns_memcached_sockets_pool, '-executing_on_socket/3-fun-0-', 3, [{file, "src/ns_memcached_sockets_pool.erl"}, {line, 88}]},
      {async, '-async_init/4-fun-1-', 3, [{file, "src/async.erl"}, {line, 197}]}]},
      {gen_server, call,
      ['ns_memcached-default',
      {get_dcp_docs_estimate, 540 "replication:ns_1@172.23.100.21->ns_1@172.23.100.10:default"}, 180000]}},
      {gen_server, call,
      [{'janitor_agent-default', 'ns_1@172.23.100.21'},
      {if_rebalance, <0.32015.3>,
      {update_vbucket_state, 974, replica, undefined, 'ns_1@172.23.100.22'}}, infinity]}}}}]},
      [{janitor_agent,
      bulk_set_vbucket_state, 4, [{file, "src/janitor_agent.erl"}, {line, 403}]},
      {ns_single_vbucket_mover, update_replication_post_move, 5, [{file, "src/ns_single_vbucket_mover.erl"}, {line, 530}]},
      {ns_single_vbucket_mover, on_move_done_body, 6, [{file, "src/ns_single_vbucket_mover.erl"}, {line, 556}]},
      {proc_lib, init_p,3, [{file, "proc_lib.erl"}, {line, 234}]}]}}}
      

      Jenkins log: http://qa.sc.couchbase.com/job/test_suite_executor-TAF/96702/console

      Attachments

        Issue Links

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty