Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-9594

able to add node in cluster1 that is still in rebalance out after failover on cluster2

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Critical
    • 3.0
    • 2.5.0
    • ns_server
    • Security Level: Public
    • None
    • Triaged
    • Windows 64-bit

    Description

      steps:
      it was system test offline upgrade 2.0.0->2.5

      we can skip all steps and to begin the important points, when all the servers were already are 2.5

      1) cluster1 with default 6.5M items and sasl 3.5 items:
      172.23.105.12
      172.23.105.13
      172.23.105.14

      cluster2 with default 6.5M items and sasl 3.5 items::
      172.23.105.17
      172.23.105.72
      172.23.105.74

      XDCR bidirection for all buckets

      2)reboot 172.23.105.14 to get autofailover and rebalance out(cluster1)
      3)add 172.23.105.14 in cluster2 and start rebalance
      4) create views on all clusters
      curl -v -X PUT -H 'Content-Type: application/json' 'http://sasl:sasl@172.23.105.12:8092/sasl/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta)

      {emit(doc.city,[doc.name, doc.email]);}"}, "view1" : {"map":"function(doc, meta){emit([doc.category, doc.coins], doc.name);}"}}}'
      2015 curl -v -X PUT -H 'Content-Type: application/json' 'http://172.23.105.12:8092/default/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta) {emit(doc.city,[doc.name, doc.email]);}

      "}, "view1" : {"map":"function(doc, meta)

      {emit([doc.category, doc.coins], doc.name);}"}}}'
      2016 curl -v -X PUT -H 'Content-Type: application/json' 'http://172.23.105.15:8092/default/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta) {emit(doc.city,[doc.name, doc.email]);}"}, "view1" : {"map":"function(doc, meta){emit([doc.category, doc.coins], doc.name);}

      "}}}'
      2017 curl -v -X PUT -H 'Content-Type: application/json' 'http://sasl:sasl@172.23.105.15:8092/sasl/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta)

      {emit(doc.city,[doc.name, doc.email]);}

      "}, "view1" : {"map":"function(doc, meta)

      {emit([doc.category, doc.coins], doc.name);}

      "}}}

      5)then stop rebalance manually on cluster2, failover 172.23.105.14(now it's in cluster2) and start rebalance out

      6)add 172.23.105.14 back in cluster1 ( rebalance out on cluster2 is still in progress)

      result: we are able to add node in cluster1 but rebalance failed on cluster2

      logs from clusters with timestamp:

      1) Started rebalancing bucket sasl ns_rebalancer000 ns_1@172.23.105.12 07:28:33 - Tue Nov 19, 2013
      Starting rebalance, KeepNodes = ['ns_1@172.23.105.12','ns_1@172.23.105.13',
      'ns_1@172.23.105.14'], EjectNodes = []
      ns_orchestrator004 ns_1@172.23.105.12 07:28:32 - Tue Nov 19, 2013
      Node 'ns_1@172.23.105.13' saw that node 'ns_1@172.23.105.14' came up. Tags: [] ns_node_disco004 ns_1@172.23.105.13 07:28:09 - Tue Nov 19, 2013
      Node 'ns_1@172.23.105.12' saw that node 'ns_1@172.23.105.14' came up. Tags: [] ns_node_disco004 ns_1@172.23.105.12 07:28:09 - Tue Nov 19, 2013
      Started node add transaction by adding node 'ns_1@172.23.105.14' to nodes_wanted (group: 0)
      ns_cluster000 ns_1@172.23.105.12 07:28:08 - Tue Nov 19, 201

      (rebalance is in progress, but it should be successful)

      2)cluster2:

      Control connection to memcached on 'ns_1@172.23.105.74' disconnected: {badmatch,
      {error,
      timeout}} ns_memcached004 ns_1@172.23.105.74 07:31:29 - Tue Nov 19, 2013
      Rebalance exited with reason {unexpected_exit,
      {'EXIT',<0.30701.306>,
      {wait_checkpoint_persisted_failed,"sasl",170,
      29,
      [{'ns_1@172.23.105.74',
      {'EXIT',
      {badmatch,{error,timeout,
      {gen_server,call,
      ['ns_memcached-sasl',

      {wait_for_checkpoint_persistence,670, 30}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-sasl', 'ns_1@172.23.105.74'}

      ,
      {if_rebalance,<0.29752.306>,
      {wait_checkpoint_persisted,170,29}},
      infinity]}}}}]}}}
      ns_orchestrator002 ns_1@172.23.105.15 07:31:29 - Tue Nov 19, 2013
      <0.30304.306> exited with {unexpected_exit,
      {'EXIT',<0.30701.306>,
      {wait_checkpoint_persisted_failed,"sasl",170,29,
      [{'ns_1@172.23.105.74',
      {'EXIT',
      {badmatch,{error,timeout,
      {gen_server,call,
      ['ns_memcached-sasl',

      {wait_for_checkpoint_persistence,670,30}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-sasl','ns_1@172.23.105.74'}

      ,
      {if_rebalance,<0.29752.306>,
      {wait_checkpoint_persisted,170,29}},
      infinity]}}}}]}}} ns_vbucket_mover000 ns_1@172.23.105.15 07:31:29 - Tue Nov 19, 2013
      Bucket "sasl" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.15 07:28:00 - Tue Nov 19, 2013
      Node 'ns_1@172.23.105.74' saw that node 'ns_1@172.23.105.14' went down. Details: [

      {nodedown_reason, connection_closed}] ns_node_disco005 ns_1@172.23.105.74 07:27:58 - Tue Nov 19, 2013
      Node 'ns_1@172.23.105.72' saw that node 'ns_1@172.23.105.14' went down. Details: [{nodedown_reason, connection_closed}

      ] ns_node_disco005 ns_1@172.23.105.72 07:27:58 - Tue Nov 19, 2013
      Node 'ns_1@172.23.105.15' saw that node 'ns_1@172.23.105.14' went down. Details: [

      {nodedown_reason, connection_closed}

      ] ns_node_disco005 ns_1@172.23.105.15 07:27:58 - Tue Nov 19, 2013
      Started rebalancing bucket sasl ns_rebalancer000 ns_1@172.23.105.15 07:27:58 - Tue Nov 19, 2013
      Starting rebalance, KeepNodes = ['ns_1@172.23.105.15','ns_1@172.23.105.72',
      'ns_1@172.23.105.74'], EjectNodes = []
      ns_orchestrator004 ns_1@172.23.105.15 07:27:58 - Tue Nov 19, 2013
      Failed over 'ns_1@172.23.105.14': ok ns_orchestrator006 ns_1@172.23.105.15 07:27:44 - Tue Nov 19, 2013
      Starting failing over 'ns_1@172.23.105.14' ns_orchestrator000 ns_1@172.23.105.15 07:27:43 - Tue Nov 19, 2013
      Rebalance stopped by user.
      ns_orchestrator007 ns_1@172.23.105.15 07:27:26 - Tue Nov 19, 2013
      Bucket "sasl" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.15 07:23:18 - Tue Nov 19, 2013
      Started rebalancing bucket sasl ns_rebalancer000 ns_1@172.23.105.15 07:23:15 - Tue Nov 19, 2013
      Starting rebalance, KeepNodes = ['ns_1@172.23.105.15','ns_1@172.23.105.72',
      'ns_1@172.23.105.74','ns_1@172.23.105.14'], EjectNodes = []
      ns_orchestrator004 ns_1@172.23.105.15 07:23:08 - Tue Nov 19, 2013

      the main idea of the bug that I can add node to joun new cluster even the node are still in rebalance out on the other cluster

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Aliaksey Artamonau Aliaksey Artamonau (Inactive)
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty