Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7290

Rebalance-in operation failed twice with "bulk_set_vbucket_state" failing with heavy front end load on an XDCR set up and with system in DGM (~65% resident ratio)

    XMLWordPrintable

Details

    Description

      At the time of the rebalance failure:

      + 5 nodes rebalance in on each cluster
      Cluster setup: c1:c2::10:10
      biXDCR_bucket: c1 <---> c2
      uniXDCR_src: c1 ---> c2 :uniXDCR_dest
      Front end loads on c1 and c2 for biXDCR_bucket, and on c1 for uniXDCR_src.
      c1: http://ec2-177-71-230-72.sa-east-1.compute.amazonaws.com:8091/
      c2: http://ec2-175-41-186-167.ap-southeast-1.compute.amazonaws.com:8091/

      On C1, Rebalance operation failed with this reason on the UI logs:

      Rebalance exited with reason {{bulk_set_vbucket_state_failed,
      [{'ns_1@ec2-177-71-170-44.sa-east-1.compute.amazonaws.com',
      {'EXIT',
      {{timeout,
      {gen_server,call,
      ['ns_memcached-biXDCR_bucket',

      {set_vbucket,544,replica}

      ,
      180000]}},
      {gen_server,call,
      [

      {'janitor_agent-biXDCR_bucket', 'ns_1@ec2-177-71-170-44.sa-east-1.compute.amazonaws.com'}

      ,
      {if_rebalance,<0.10136.88>,
      {update_vbucket_state,544,replica,
      undefined,undefined}},
      infinity]}}}}]},
      [

      {janitor_agent,bulk_set_vbucket_state,4}

      ,

      {ns_vbucket_mover, update_replication_post_move,3}

      ,

      {ns_vbucket_mover,handle_info,2}

      ,

      {gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}

      The second time, rebalance failed with the following UI log message:

      Rebalance exited with reason {{timeout,
      {gen_server,call,
      ['ns_memcached-biXDCR_bucket',

      {set_vbucket,849,active}

      ,
      180000]}},
      {gen_server,call,
      [

      {'janitor_agent-biXDCR_bucket', 'ns_1@ec2-177-71-230-72.sa-east-1.compute.amazonaws.com'}

      ,
      {if_rebalance,<0.21090.114>,
      {update_vbucket_state,849,active,paused,
      undefined}},
      infinity]}}

      After giving it some time, the third rebalance did complete successfully.

      Will attach the grabbed diags from one of the nodes at C1 in a bit.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              mikew Mike Wiederhold [X] (Inactive)
              abhinav Abhi Dangeti
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty