Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-12968

able to start graceful failover when node is unhealthy -> Rebalance exited with reason {pre_rebalance_config_synchronization_failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • 4.0.0
    • 3.0
    • ns_server
    • Security Level: Public
    • None
    • 3.0.0-443
    • Untriaged
    • Unknown

    Description

      steps:

      1. 3 nodes in cluster
      2. turn on firewall on one node
      3. wait while node become "unhealthy"
      4. trigger graceful failover

      [2014-03-17 18:36:26,188] - [remote_util:1450] INFO - running command.raw on 10.3.4.145: /sbin/iptables -A INPUT -p tcp -i eth0 --dport 1000:60000 -j REJECT
      [2014-03-17 18:36:27,731] - [remote_util:1479] INFO - command executed successfully
      [2014-03-17 18:36:27,731] - [remote_util:2263] INFO - enabled firewall on ip:10.3.4.145 port:8091 ssh_username:root
      [2014-03-17 18:36:27,731] - [remote_util:1450] INFO - running command.raw on 10.3.4.145: /sbin/iptables --list
      [2014-03-17 18:36:29,304] - [remote_util:1479] INFO - command executed successfully
      [2014-03-17 18:36:29,304] - [remote_util:1401] INFO - Chain INPUT (policy ACCEPT)
      [2014-03-17 18:36:29,305] - [remote_util:1401] INFO - target prot opt source destination
      [2014-03-17 18:36:29,305] - [remote_util:1401] INFO - REJECT tcp – anywhere anywhere tcp dpts:cadlock2:60000 reject-with icmp-port-unreachable
      [2014-03-17 18:36:29,305] - [remote_util:1401] INFO -
      [2014-03-17 18:36:29,305] - [remote_util:1401] INFO - Chain FORWARD (policy ACCEPT)
      [2014-03-17 18:36:29,306] - [remote_util:1401] INFO - target prot opt source destination
      [2014-03-17 18:36:29,306] - [remote_util:1401] INFO -
      [2014-03-17 18:36:29,306] - [remote_util:1401] INFO - Chain OUTPUT (policy ACCEPT)
      [2014-03-17 18:36:29,306] - [remote_util:1401] INFO - target prot opt source destination
      [2014-03-17 18:36:29,306] - [remote_util:1401] INFO -
      [2014-03-17 18:36:29,306] - [remote_util:1401] INFO - Chain RH-Firewall-1-INPUT (0 references)
      [2014-03-17 18:36:29,306] - [remote_util:1401] INFO - target prot opt source destination
      [2014-03-17 18:36:32,503] - [rest_client:125] INFO - node ns_1@10.3.4.145 status : unhealthy
      [2014-03-17 18:36:32,503] - [rest_client:132] INFO - node ns_1@10.3.4.145 status_reached : True
      [2014-03-17 18:36:32,503] - [failovertests:72] INFO - node 10.3.4.145:8091 is 'unhealthy' as expected
      [2014-03-17 18:36:33,727] - [rest_client:942] INFO - fail_over node ns_1@10.3.4.145 successful
      [2014-03-17 18:36:34,905] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-03-17 18:36:37,482] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-03-17 18:36:40,441] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-03-17 18:36:43,481] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-03-17 18:36:46,512] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-03-17 18:36:49,559] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-03-17 18:36:52,518] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-03-17 18:36:55,456] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-03-17 18:36:58,390] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-03-17 18:37:01,530] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-03-17 18:37:04,490] - [rest_client:1059] ERROR -

      {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'}

      - rebalance failed
      [2014-03-17 18:37:09,870] - [rest_client:1838] INFO - Latest logs from UI:
      [2014-03-17 18:37:09,870] - [rest_client:1839] ERROR - {u'node': u'ns_1@10.3.4.144', u'code': 2, u'text': u"Rebalance exited with reason

      {pre_rebalance_config_synchronization_failed,\n ['ns_1@10.3.4.145']}

      \n", u'shortText': u'message', u'serverTime': u'2014-03-17T08:09:14.238Z', u'module': u'ns_orchestrator', u'tstamp': 1395068954238, u'type': u'info'}
      [2014-03-17 18:37:09,871] - [rest_client:1839] ERROR -

      {u'node': u'ns_1@10.3.4.144', u'code': 0, u'text': u"Starting vbucket moves for graceful failover of 'ns_1@10.3.4.145'", u'shortText': u'message', u'serverTime': u'2014-03-17T08:08:44.227Z', u'module': u'ns_rebalancer', u'tstamp': 1395068924227, u'type': u'info'}

      [2014-03-17 18:37:09,871] - [rest_client:1839] ERROR -

      {u'node': u'ns_1@10.3.4.144', u'code': 1, u'text': u'Rebalance completed successfully.\n', u'shortText': u'message', u'serverTime': u'2014-03-17T08:06:08.669Z', u'module': u'ns_orchestrator', u'tstamp': 1395068768669, u'type': u'info'}

      [2014-03-17 18:37:09,871] - [rest_client:1839] ERROR -

      {u'node': u'ns_1@10.3.4.144', u'code': 0, u'text': u'Bucket "default" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'serverTime': u'2014-03-17T08:02:05.908Z', u'module': u'ns_vbucket_mover', u'tstamp': 1395068525908, u'type': u'info'}

      [2014-03-17 18:37:09,871] - [rest_client:1839] ERROR -

      {u'node': u'ns_1@10.3.4.145', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@10.3.4.145\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-03-17T08:02:05.205Z', u'module': u'ns_memcached', u'tstamp': 1395068525205, u'type': u'info'}

      [2014-03-17 18:37:09,871] - [rest_client:1839] ERROR -

      {u'node': u'ns_1@10.3.4.147', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@10.3.4.147\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-03-17T08:02:05.023Z', u'module': u'ns_memcached', u'tstamp': 1395068525023, u'type': u'info'}

      [2014-03-17 18:37:09,871] - [rest_client:1839] ERROR -

      {u'node': u'ns_1@10.3.4.144', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'serverTime': u'2014-03-17T08:02:04.681Z', u'module': u'ns_rebalancer', u'tstamp': 1395068524681, u'type': u'info'}

      [2014-03-17 18:37:09,872] - [rest_client:1839] ERROR -

      {u'node': u'ns_1@10.3.4.144', u'code': 0, u'text': u'Bucket "bucket0" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'serverTime': u'2014-03-17T07:57:48.975Z', u'module': u'ns_vbucket_mover', u'tstamp': 1395068268975, u'type': u'info'}

      [2014-03-17 18:37:09,872] - [rest_client:1839] ERROR -

      {u'node': u'ns_1@10.3.4.145', u'code': 0, u'text': u'Bucket "bucket0" loaded on node \'ns_1@10.3.4.145\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-03-17T07:57:48.621Z', u'module': u'ns_memcached', u'tstamp': 1395068268621, u'type': u'info'}

      [2014-03-17 18:37:09,872] - [rest_client:1839] ERROR -

      {u'node': u'ns_1@10.3.4.147', u'code': 0, u'text': u'Bucket "bucket0" loaded on node \'ns_1@10.3.4.147\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-03-17T07:57:48.296Z', u'module': u'ns_memcached', u'tstamp': 1395068268296, u'type': u'info'}

      Please note that we almost immediately got that node "unhealthy" and started graceful failover
      it's better to get in response that we can not perform graceful failover due to node is unreachable

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Aliaksey Artamonau Aliaksey Artamonau (Inactive)
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty