Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7168

[Doc'd 2.2.0] failover of node that's completely down is still not quick (was: Rebalance exited with reason {not_all_nodes_are_ready_yet after failover node)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: 2.0, 2.0.1, 2.1.0
    • Fix Version/s: 3.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None
    • Flagged:
      Release Note
    • Sprint:
      12/Aug - 30/Aug

      Description

      version=2.0.0-1947-rel
      http://qa.hq.northscale.net/job/centos-64-2.0-failover-tests/448/consoleFull
      /testrunner -i resources/jenkins/centos-64-5node-failover.ini get-logs=True,GROUP=BAT -t failovertests.FailoverTests.test_failover_firewall,replica=1,keys_count=100000,dgm_run=True,GROUP=BAT

      steps:
      1.3 nodes in cluster:10.1.3.114, 10.1.3.118,10.1.3.116
      2.enabled firewall on ip:10.1.3.118 : /sbin/iptables -A INPUT -p tcp -i eth0 --dport 1000:60000 -j REJECT
      3
      [2012-11-12 11:28:37,102] - [failovertests:260] INFO - node 10.1.3.118:8091 is 'unhealthy' as expected
      [2012-11-12 11:29:07,544] - [rest_client:849] INFO - fail_over successful
      [2012-11-12 11:29:07,545] - [failovertests:278] INFO - failed over node : ns_1@10.1.3.118
      [2012-11-12 11:29:07,545] - [failovertests:292] INFO - 10 seconds sleep after failover before invoking rebalance...
      4.[2012-11-12 11:29:17,545] - [rest_client:883] INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.1.3.118&user=Administrator&knownNodes=ns_1%4010.1.3.114%2Cns_1%4010.1.3.118%2Cns_1%4010.1.3.116

      result:
      [2012-11-12 11:29:17,562] - [rest_client:890] INFO - rebalance operation started
      [2012-11-12 11:29:17,569] - [rest_client:986] INFO - rebalance percentage : 0 %
      [2012-11-12 11:29:19,573] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:21,577] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:23,584] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:25,589] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:27,593] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:29,599] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:31,603] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:33,607] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:35,612] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:37,616] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:39,621] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:41,626] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:43,630] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:45,635] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:47,640] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:49,644] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:51,648] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:53,652] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:55,657] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:57,662] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:29:59,667] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:30:01,671] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:30:03,676] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:30:05,682] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:30:07,698] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:30:09,703] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:30:11,708] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:30:13,712] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:30:15,716] - [rest_client:986] INFO - rebalance percentage : 0.0 %
      [2012-11-12 11:30:17,721] - [rest_client:971] ERROR -

      {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'}

      - rebalance failed

      [rebalance:info,2012-11-12T11:49:03.581,ns_1@10.1.3.114:<0.20240.1>:ns_rebalancer:rebalance:258]Waiting for bucket "default" to be ready on ['ns_1@10.1.3.114',
      'ns_1@10.1.3.116']
      [ns_server:info,2012-11-12T11:49:09.386,ns_1@10.1.3.114:<0.769.0>:ns_orchestrator:handle_info:282]Skipping janitor in state rebalancing: {rebalancing_state,<0.20215.1>,
      {dict,3,16,16,8,80,48,

      {[],[],[],[],[],[],[],[],[],[],[],[], [],[],[],[]},
      {{[],[],[],[],
      [['ns_1@10.1.3.114'|0.0]],
      [],
      [['ns_1@10.1.3.116'|0.0]],
      [],
      [['ns_1@10.1.3.118'|0.0]],
      [],[],[],[],[],[],[]}}},
      ['ns_1@10.1.3.114','ns_1@10.1.3.116'],
      [],
      ['ns_1@10.1.3.118']}
      [ns_server:info,2012-11-12T11:49:19.386,ns_1@10.1.3.114:<0.769.0>:ns_orchestrator:handle_info:282]Skipping janitor in state rebalancing: {rebalancing_state,<0.20215.1>,
      {dict,3,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[], [],[],[],[]}

      ,
      {{[],[],[],[],
      [['ns_1@10.1.3.114'|0.0]],
      [],
      [['ns_1@10.1.3.116'|0.0]],
      [],
      [['ns_1@10.1.3.118'|0.0]],
      [],[],[],[],[],[],[]}}},
      ['ns_1@10.1.3.114','ns_1@10.1.3.116'],
      [],
      ['ns_1@10.1.3.118']}
      [ns_server:info,2012-11-12T11:49:23.120,ns_1@10.1.3.114:<0.20319.1>:compaction_daemon:try_to_cleanup_indexes:439]Cleaning up indexes for bucket `default`
      [ns_server:info,2012-11-12T11:49:23.121,ns_1@10.1.3.114:<0.20319.1>:compaction_daemon:spawn_bucket_compactor:404]Compacting bucket default with config:
      [{database_fragmentation_threshold,{30,undefined}},
      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:info,2012-11-12T11:49:24.348,ns_1@10.1.3.114:ns_config_rep<0.358.0>:ns_config_rep:do_pull:341]Pulling config from: 'ns_1@10.1.3.116'

      [ns_server:info,2012-11-12T11:49:29.386,ns_1@10.1.3.114:<0.769.0>:ns_orchestrator:handle_info:282]Skipping janitor in state rebalancing: {rebalancing_state,<0.20215.1>,
      {dict,3,16,16,8,80,48,

      {[],[],[],[],[],[],[],[],[],[],[],[], [],[],[],[]},
      {{[],[],[],[],
      [['ns_1@10.1.3.114'|0.0]],
      [],
      [['ns_1@10.1.3.116'|0.0]],
      [],
      [['ns_1@10.1.3.118'|0.0]],
      [],[],[],[],[],[],[]}}},
      ['ns_1@10.1.3.114','ns_1@10.1.3.116'],
      [],
      ['ns_1@10.1.3.118']}
      [ns_server:warn,2012-11-12T11:49:29.546,ns_1@10.1.3.114:capi_set_view_manager-default<0.15508.1>:capi_set_view_manager:handle_info:345]Remote server node {'capi_ddoc_replication_srv-default','ns_1@10.1.3.118'} process down: noconnection
      [ns_server:warn,2012-11-12T11:49:29.546,ns_1@10.1.3.114:xdc_rdoc_replication_srv<0.470.0>:xdc_rdoc_replication_srv:handle_info:128]Remote server node {xdc_rdoc_replication_srv,'ns_1@10.1.3.118'} process down: noconnection
      [user:warn,2012-11-12T11:49:29.546,ns_1@10.1.3.114:ns_node_disco<0.351.0>:ns_node_disco:handle_info:168]Node 'ns_1@10.1.3.114' saw that node 'ns_1@10.1.3.118' went down.
      [error_logger:error,2012-11-12T11:49:29.546,ns_1@10.1.3.114:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Node 'ns_1@10.1.3.118' not responding **
      ** Removing (timedout) connection **

      [ns_server:info,2012-11-12T11:49:30.536,ns_1@10.1.3.114:ns_config_rep<0.358.0>:ns_config_rep:do_pull:341]Pulling config from: 'ns_1@10.1.3.116'

      [ns_server:info,2012-11-12T11:49:39.386,ns_1@10.1.3.114:<0.769.0>:ns_orchestrator:handle_info:282]Skipping janitor in state rebalancing: {rebalancing_state,<0.20215.1>,
      {dict,3,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[], [],[],[],[]}

      ,
      {{[],[],[],[],
      [['ns_1@10.1.3.114'|0.0]],
      [],
      [['ns_1@10.1.3.116'|0.0]],
      [],
      [['ns_1@10.1.3.118'|0.0]],
      [],[],[],[],[],[],[]}}},
      ['ns_1@10.1.3.114','ns_1@10.1.3.116'],
      [],
      ['ns_1@10.1.3.118']}
      [ns_server:info,2012-11-12T11:49:49.386,ns_1@10.1.3.114:<0.769.0>:ns_orchestrator:handle_info:282]Skipping janitor in state rebalancing: {rebalancing_state,<0.20215.1>,
      {dict,3,16,16,8,80,48,

      {[],[],[],[],[],[],[],[],[],[],[],[], [],[],[],[]},
      {{[],[],[],[],
      [['ns_1@10.1.3.114'|0.0]],
      [],
      [['ns_1@10.1.3.116'|0.0]],
      [],
      [['ns_1@10.1.3.118'|0.0]],
      [],[],[],[],[],[],[]}}},
      ['ns_1@10.1.3.114','ns_1@10.1.3.116'],
      [],
      ['ns_1@10.1.3.118']}
      [ns_server:info,2012-11-12T11:49:53.135,ns_1@10.1.3.114:<0.20445.1>:compaction_daemon:try_to_cleanup_indexes:439]Cleaning up indexes for bucket `default`
      [ns_server:info,2012-11-12T11:49:53.136,ns_1@10.1.3.114:<0.20445.1>:compaction_daemon:spawn_bucket_compactor:404]Compacting bucket default with config:
      [{database_fragmentation_threshold,{30,undefined}},
      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:info,2012-11-12T11:49:59.386,ns_1@10.1.3.114:<0.769.0>:ns_orchestrator:handle_info:282]Skipping janitor in state rebalancing: {rebalancing_state,<0.20215.1>,
      {dict,3,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[], [],[],[],[]}

      ,
      {{[],[],[],[],
      [['ns_1@10.1.3.114'|0.0]],
      [],
      [['ns_1@10.1.3.116'|0.0]],
      [],
      [['ns_1@10.1.3.118'|0.0]],
      [],[],[],[],[],[],[]}}},
      ['ns_1@10.1.3.114','ns_1@10.1.3.116'],
      [],
      ['ns_1@10.1.3.118']}
      [user:info,2012-11-12T11:50:03.582,ns_1@10.1.3.114:<0.769.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason

      {not_all_nodes_are_ready_yet, ['ns_1@10.1.3.114','ns_1@10.1.3.116']}
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            andreibaranouski Andrei Baranouski
          • Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Agile

                Gerrit Reviews

                There are no open Gerrit changes