Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-8358

[system test] Rebalance failed couple of times due to bulk_set_vbucket_state_failed in a heavy dgm cluster

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • 3.0
    • 2.1.0
    • ns_server
    • Security Level: Public
    • None
    • build-807-rel
    • Centos 64-bit

    Description

      Cluster ip is 172.23.105.23
      1. create 8 nodes cluster, each node has 12G RAM, HHD
      2. create 2 buckets default and saslbucket, with memory quota 7G and 5G
      3. Run the KV only use case for 1 week:
      After 1 week run, loading 150 M items to default bucket, 23% resident, 60M items to sasl bucket, make bucket into 44% resident ratio, the workload for both buckets is15k ops/sec with 5% create, 5% delete, 5%expire, 5% update, 80% gets.

      Then in a heavy dgm state, try to rebalance in one node. Fail and do another rebalance, still get the same error:

      Rebalance exited with reason {bulk_set_vbucket_state_failed,
      [{'ns_1@172.23.105.23',
      {'EXIT',
      {{{{unexpected_reason,killed},
      [

      {misc,executing_on_new_process,1}, {tap_replication_manager, change_vbucket_filter,4}, {tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^2/1-2-', 2}, {tap_replication_manager, do_set_incoming_replication_map,3}, {tap_replication_manager,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]},
      {gen_server,call,
      ['tap_replication_manager-saslbucket', {change_vbucket_replication,583, undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-saslbucket', 'ns_1@172.23.105.23'},
      {if_rebalance,<0.9405.321>,
      {update_vbucket_state,583,replica,
      undefined,undefined}},
      infinity]}}}}]}
      ns_orchestrator002 ns_1@172.23.105.23 16:48:35 - Tue May 28, 2013
      <0.15529.323> exited with {bulk_set_vbucket_state_failed,
      [{'ns_1@172.23.105.23',
      {'EXIT',
      {{{{unexpected_reason,killed},
      [{misc,executing_on_new_process,1}

      ,

      {tap_replication_manager, change_vbucket_filter,4}

      ,

      {tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^2/1-2-', 2}

      ,

      {tap_replication_manager, do_set_incoming_replication_map,3}

      ,

      {tap_replication_manager,handle_call,3}

      ,

      {gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]},
      {gen_server,call,
      ['tap_replication_manager-saslbucket',

      {change_vbucket_replication,583,undefined}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-saslbucket', 'ns_1@172.23.105.23'}

      ,
      {if_rebalance,<0.9405.321>,
      {update_vbucket_state,583,replica,
      undefined,undefined}},
      infinity]}}}}]} ns_vbucket_mover000 ns_1@172.23.105.23 16:48:05 - Tue May 28, 2013
      Failed to get tap stats after 5 attempts ebucketmigrator_srv000 ns_1@172.23.105.27 15:44:12 - Tue May 28, 2013
      Bucket "saslbucket" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.23 15:03:11 - Tue May 28, 2013
      Started rebalancing bucket saslbucket ns_rebalancer000 ns_1@172.23.105.23 15:03:10 - Tue May 28, 2013
      Starting rebalance, KeepNodes = ['ns_1@172.23.105.23','ns_1@172.23.105.25',
      'ns_1@172.23.105.26','ns_1@172.23.105.27',
      'ns_1@172.23.105.28','ns_1@172.23.105.29',
      'ns_1@172.23.105.30','ns_1@172.23.105.31',
      'ns_1@172.23.105.32','ns_1@172.23.105.33'], EjectNodes = []
      ns_orchestrator004 ns_1@172.23.105.23 15:03:09 - Tue May 28, 2013

      This block the cluster from any topology change.

      The link of the diags is https://s3.amazonaws.com/bugdb/jira/MB-8358/172.23.105.23-807.zip

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            alkondratenko Aleksey Kondratenko (Inactive)
            Chisheng Chisheng Hong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty