Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-23265

[GSI][Rebalance] Rebalance hangs when kv and index (services on separate nodes) nodes are rebalanced out while queries are going on

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 5.0.0
    • 5.0.0
    • secondary-index
    • Build: 5.0.0-2194
      Indexer Storage Mode: Memory_optimized

      Cluster Configuration:
      Node 1: n1ql, kv
      Node 2: kv
      Node 3: kv
      Node 4: index
      Node 5: index

    Description

      Steps:
      1. Setup and configure the cluster.
      2. Create and load the buckets.
      3. Create few indexes on the buckets.
      4. Rebalance index(Node 4) and kv(Node 3) nodes out while queries are going on.

      Rebalance hangs at 73.20%

      Seeing the error:

      {u'node': u'ns_1@172.23.109.137', u'code': 0, u'text': u'Shutting down bucket "default" on \'ns_1@172.23.109.137\' for deletion', u'shortText': u'message', u'serverTime': u'2017-03-10T03:51:07.268Z', u'module': u'ns_memcached', u'tstamp': 1489146667268, u'type': u'info'}
      2017-03-10 03:55:27 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.109.67', u'code': 0, u'text': u'Bucket "default" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'serverTime': u'2017-03-10T03:50:06.914Z', u'module': u'ns_vbucket_mover', u'tstamp': 1489146606914, u'type': u'info'}
      2017-03-10 03:55:27 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.109.67', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'serverTime': u'2017-03-10T03:50:06.690Z', u'module': u'ns_rebalancer', u'tstamp': 1489146606690, u'type': u'info'}
      2017-03-10 03:55:27 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.109.67', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.109.69','ns_1@172.23.109.67',\n                                 'ns_1@172.23.109.70'], EjectNodes = ['ns_1@172.23.109.137',\n                                                                      'ns_1@172.23.109.71'], Failed over and being ejected nodes = []; no delta recovery nodes\n", u'shortText': u'message', u'serverTime': u'2017-03-10T03:50:06.619Z', u'module': u'ns_orchestrator', u'tstamp': 1489146606619, u'type': u'info'}
      2017-03-10 03:55:27 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.109.67', u'code': 102, u'text': u'Client-side error-report for user "Administrator" on node \'ns_1@172.23.109.67\':\nUser-Agent:Python-httplib2/$Rev: 259 $\n2017-03-10 03:49:30.476254 : test_rebalance_out started \n', u'shortText': u'client-side error report', u'serverTime': u'2017-03-10T03:49:30.490Z', u'module': u'menelaus_web', u'tstamp': 1489146570490, u'type': u'warning'}
      2017-03-10 03:55:27 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.109.69', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@172.23.109.69\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2017-03-10T03:49:27.163Z', u'module': u'ns_memcached', u'tstamp': 1489146567163, u'type': u'info'}
      2017-03-10 03:55:27 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.109.67', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@172.23.109.67\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2017-03-10T03:49:27.135Z', u'module': u'ns_memcached', u'tstamp': 1489146567135, u'type': u'info'}
      2017-03-10 03:55:27 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.109.137', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@172.23.109.137\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2017-03-10T03:49:27.133Z', u'module': u'ns_memcached', u'tstamp': 1489146567133, u'type': u'info'}
      2017-03-10 03:55:27 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.109.67', u'code': 12, u'text': u'Created bucket "default" of type: couchbase\n[{num_replicas,1},\n {replica_index,true},\n {ram_quota,2111832064},\n {auth_type,sasl},\n {flush_enabled,true},\n {num_threads,3},\n {eviction_policy,value_only},\n {conflict_resolution_type,seqno},\n {storage_mode,couchstore}]', u'shortText': u'message', u'serverTime': u'2017-03-10T03:49:26.991Z', u'module': u'menelaus_web', u'tstamp': 1489146566991, u'type': u'info'}
      2017-03-10 03:55:27 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.109.67', u'code': 0, u'text': u'Reset auto-failover count', u'shortText': u'message', u'serverTime': u'2017-03-10T03:49:18.453Z', u'module': u'auto_failover', u'tstamp': 1489146558453, u'type': u'info'}
      [('/usr/lib64/python2.7/threading.py', 784, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib64/python2.7/threading.py', 811, '__bootstrap_inner', 'self.run()'), ('./testrunner.py', 291, 'run', '**self._Thread__kwargs)'), ('/usr/lib64/python2.7/unittest/runner.py', 151, 'run', 'test(result)'), ('/usr/lib64/python2.7/unittest/case.py', 433, '__call__', 'return self.run(*args, **kwds)'), ('/usr/lib64/python2.7/unittest/case.py', 369, 'run', 'testMethod()'), ('pytests/2i/recovery_2i.py', 79, 'test_rebalance_out', 'rebalance.result()'), ('lib/tasks/future.py', 160, 'result', 'return self.__get_result()'), ('lib/tasks/future.py', 111, '__get_result', 'print traceback.extract_stack()')]
      
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            deepkaran.salooja Deepkaran Salooja
            prasanna.gholap Prasanna Gholap [X] (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty