Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7635

Rebalance exited with reason {pre_rebalance_config_synchronization_failed NODE, even node was ejected more than an hour before

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.0.1
    • 2.0.1
    • ns_server
    • Security Level: Public
    • None
    • centos 64

    Description

      build version=2.0.1-145-rel
      http://qa.hq.northscale.net/view/2.0.1/job/ubuntu-64-2.0-new-rebalance-tests-P0/335/consoleFull

      ./testrunner -i /tmp/rebalance-tests.ini get-logs=True,wait_timeout=180,GROUP=P0,get-cbcollect-info=True -t rebalance.rebalanceout.RebalanceOutTests.rebalance_out_with_ops,nodes_out=3,GROUP=OUT;P0

      Test Input params:

      {'GROUP': 'P0', 'case_number': 10, 'cluster_name': 'rebalance-tests', 'conf_file': 'conf/py-newrebalance.conf', 'get-cbcollect-info': 'True', 'get-logs': 'True', 'ini': '/tmp/rebalance-tests.ini', 'nodes_out': '3', 'num_nodes': 6, 'spec': 'py-newrebalance', rebalance_out_with_ops (rebalance.rebalanceout.RebalanceOutTests) ... 'wait_timeout': '180'}

      [2013-01-29 04:48:06,977] - [basetestcase:55] INFO - ============== basetestcase setup was started for test #10 rebalance_out_with_ops==============
      [2013-01-29 04:48:07,995] - [rest_client:595] INFO - settings/web params : username=Administrator&password=password&port=8091
      [2013-01-29 04:48:08,019] - [rest_client:606] INFO - pools/default params : memoryQuota=2106&username=Administrator&password=password
      [2013-01-29 04:48:08,032] - [rest_client:595] INFO - settings/web params : username=Administrator&password=password&port=8091
      [2013-01-29 04:48:08,056] - [rest_client:606] INFO - pools/default params : memoryQuota=2106&username=Administrator&password=password
      [2013-01-29 04:48:08,077] - [rest_client:595] INFO - settings/web params : username=Administrator&password=password&port=8091
      [2013-01-29 04:48:08,095] - [rest_client:606] INFO - pools/default params : memoryQuota=2106&username=Administrator&password=password
      [2013-01-29 04:48:08,118] - [rest_client:595] INFO - settings/web params : username=Administrator&password=password&port=8091
      [2013-01-29 04:48:08,152] - [rest_client:606] INFO - pools/default params : memoryQuota=2054&username=Administrator&password=password
      [2013-01-29 04:48:08,170] - [rest_client:595] INFO - settings/web params : username=Administrator&password=password&port=8091
      [2013-01-29 04:48:08,191] - [rest_client:606] INFO - pools/default params : memoryQuota=2106&username=Administrator&password=password
      [2013-01-29 04:48:08,210] - [rest_client:595] INFO - settings/web params : username=Administrator&password=password&port=8091
      [2013-01-29 04:48:08,234] - [rest_client:606] INFO - pools/default params : memoryQuota=2054&username=Administrator&password=password
      [2013-01-29 04:48:09,241] - [task:232] INFO - adding node 10.5.2.14:8091 to cluster
      [2013-01-29 04:48:09,242] - [rest_client:712] INFO - adding remote node @10.5.2.14:8091 to this cluster @10.5.2.11:8091
      [2013-01-29 04:48:09,705] - [task:232] INFO - adding node 10.5.2.13:8091 to cluster
      [2013-01-29 04:48:09,706] - [rest_client:712] INFO - adding remote node @10.5.2.13:8091 to this cluster @10.5.2.11:8091
      [2013-01-29 04:48:10,621] - [task:232] INFO - adding node 10.3.121.106:8091 to cluster
      [2013-01-29 04:48:10,621] - [rest_client:712] INFO - adding remote node @10.3.121.106:8091 to this cluster @10.5.2.11:8091
      [2013-01-29 04:48:11,599] - [task:232] INFO - adding node 10.5.2.15:8091 to cluster
      [2013-01-29 04:48:11,599] - [rest_client:712] INFO - adding remote node @10.5.2.15:8091 to this cluster @10.5.2.11:8091
      [2013-01-29 04:48:12,510] - [task:232] INFO - adding node 10.3.121.107:8091 to cluster
      [2013-01-29 04:48:12,511] - [rest_client:712] INFO - adding remote node @10.3.121.107:8091 to this cluster @10.5.2.11:8091
      [2013-01-29 04:48:19,589] - [rest_client:795] INFO - rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.5.2.14%2Cns_1%4010.3.121.106%2Cns_1%4010.3.121.107%2Cns_1%4010.5.2.13%2Cns_1%4010.5.2.11%2Cns_1%4010.5.2.15
      [2013-01-29 04:48:19,607] - [rest_client:799] INFO - rebalance operation started
      [2013-01-29 04:48:19,616] - [rest_client:896] INFO - rebalance percentage : 0 %
      [2013-01-29 04:48:29,633] - [rest_client:879] ERROR -

      {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'}

      - rebalance failed
      [2013-01-29 04:48:29,634] - [rest_client:880] INFO - Latest logs from UI:
      [2013-01-29 04:48:29,754] - [rest_client:881] ERROR - {u'node': u'ns_1@10.5.2.11', u'code': 2, u'text': u"Rebalance exited with reason

      {pre_rebalance_config_synchronization_failed,\n ['ns_1@10.3.121.107']}

      \n", u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1359465438152, u'type': u'info'}
      [2013-01-29 04:48:29,755] - [rest_client:881] ERROR -

      {u'node': u'ns_1@10.5.2.11', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@10.5.2.14','ns_1@10.3.121.106',\n 'ns_1@10.3.121.107','ns_1@10.5.2.13',\n 'ns_1@10.5.2.11','ns_1@10.5.2.15'], EjectNodes = []\n", u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1359465431759, u'type': u'info'}

      [2013-01-29 04:48:29,755] - [rest_client:881] ERROR -

      {u'node': u'ns_1@10.3.121.107', u'code': 3, u'text': u'Node ns_1@10.3.121.107 joined cluster', u'shortText': u'message', u'module': u'ns_cluster', u'tstamp': 1359465429227, u'type': u'info'}

      [2013-01-29 04:48:29,756] - [rest_client:881] ERROR -

      {u'node': u'ns_1@10.3.121.107', u'code': 1, u'text': u"Couchbase Server has started on web port 8091 on node 'ns_1@10.3.121.107'.", u'shortText': u'web start ok', u'module': u'menelaus_sup', u'tstamp': 1359465427603, u'type': u'info'}

      [2013-01-29 04:48:29,756] - [rest_client:881] ERROR -

      {u'node': u'ns_1@10.5.2.15', u'code': 4, u'text': u"Node 'ns_1@10.5.2.15' saw that node 'ns_1@10.3.121.107' came up.", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1359465426106, u'type': u'info'}

      [2013-01-29 04:48:29,757] - [rest_client:881] ERROR -

      {u'node': u'ns_1@10.5.2.14', u'code': 4, u'text': u"Node 'ns_1@10.5.2.14' saw that node 'ns_1@10.3.121.107' came up.", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1359465426079, u'type': u'info'}

      [2013-01-29 04:48:29,758] - [rest_client:881] ERROR -

      {u'node': u'ns_1@10.3.121.107', u'code': 4, u'text': u"Node 'ns_1@10.3.121.107' saw that node 'ns_1@10.5.2.15' came up.", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1359465426070, u'type': u'info'}

      [2013-01-29 04:48:29,759] - [rest_client:881] ERROR -

      {u'node': u'ns_1@10.3.121.107', u'code': 4, u'text': u"Node 'ns_1@10.3.121.107' saw that node 'ns_1@10.5.2.14' came up.", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1359465426065, u'type': u'info'}

      [2013-01-29 04:48:29,759] - [rest_client:881] ERROR -

      {u'node': u'ns_1@10.5.2.13', u'code': 4, u'text': u"Node 'ns_1@10.5.2.13' saw that node 'ns_1@10.3.121.107' came up.", u'shortText': u'node up', u'module': u'ns_node_disco', u'tstamp': 1359465426063, u'type': u'info'}

      previously 10.3.121.107 was ejected almost an hour ago

      2013-01-29 03:34:04,889] - [rest_client:795] INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.5.2.14%2Cns_1%4010.3.121.106%2Cns_1%4010.3.121.107%2Cns_1%4010.5.2.13%2Cns_1%4010.5.2.15&user=Administrator&knownNodes=ns_1%4010.5.2.14%2Cns_1%4010.3.121.106%2Cns_1%4010.3.121.107%2Cns_1%4010.5.2.13%2Cns_1%4010.5.2.11%2Cns_1%4010.5.2.15
      [2013-01-29 03:34:04,908] - [rest_client:799] INFO - rebalance operation started
      [2013-01-29 03:34:04,919] - [rest_client:896] INFO - rebalance percentage : 0 %
      [2013-01-29 03:34:08,933] - [rest_client:855] INFO - rebalance progress took 4.02412295341 seconds
      [2013-01-29 03:34:08,933] - [rest_client:856] INFO - sleep for 4.02412295341 seconds after rebalance...

      from server logs:

      2013-01-29 04:02:56.312 ns_orchestrator:4:info:message(ns_1@10.5.2.11) - Starting rebalance, KeepNodes = ['ns_1@10.5.2.11'], EjectNodes = ['ns_1@10.5.2.14',
      'ns_1@10.3.121.106',
      'ns_1@10.3.121.107',
      'ns_1@10.5.2.13',
      'ns_1@10.5.2.15']

      then

      2013-01-29 05:17:09.227 ns_cluster:3:info:message(ns_1@10.3.121.107) - Node ns_1@10.3.121.107 joined cluster
      2013-01-29 05:17:11.759 ns_orchestrator:4:info:message(ns_1@10.5.2.11) - Starting rebalance, KeepNodes = ['ns_1@10.5.2.14','ns_1@10.3.121.106',
      'ns_1@10.3.121.107','ns_1@10.5.2.13',
      'ns_1@10.5.2.11','ns_1@10.5.2.15'], EjectNodes = []

      2013-01-29 05:17:18.152 ns_orchestrator:2:info:message(ns_1@10.5.2.11) - Rebalance exited with reason

      {pre_rebalance_config_synchronization_failed, ['ns_1@10.3.121.107']}

      logs_node (debug):

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Aliaksey Artamonau Aliaksey Artamonau (Inactive)
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty