Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-12057

apparent deadlock in ep-engine/bucket-engine (was: node_in is in pending state/ unable to restart cb service there/Rebalance exited with reason {not_all_nodes_are_ready_yet )

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 3.0
    • 3.0
    • couchbase-bucket
    • Security Level: Public
    • 3.0.0-1174
    • Triaged
    • Unknown
    • June 30 - July 18

    Description

      steps:
      1)run data load ~12hours on source cluster http://172.23.105.156/
      2) then start replication for all 4 buckets on destination nodes(172.23.105.159, 172.23.105.160, 172.23.105.206)
      3) almost immediately after step#2 add 172.23.105.207 to destination cluster and rebalance

      Rebalance exited with reason

      {not_all_nodes_are_ready_yet, ['ns_1@172.23.105.207']}
      ns_orchestrator002 ns_1@172.23.105.159 10:49:49 - Sun Aug 24, 2014
      Started rebalancing bucket UserInfo ns_rebalancer000 ns_1@172.23.105.159 10:48:49 - Sun Aug 24, 2014
      Starting rebalance, KeepNodes = ['ns_1@172.23.105.159','ns_1@172.23.105.160',
      'ns_1@172.23.105.206','ns_1@172.23.105.207'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
      ns_orchestrator004 ns_1@172.23.105.159 10:48:49 - Sun Aug 24, 2014
      Control connection to memcached on 'ns_1@172.23.105.207' disconnected: {{badmatch,
      {error,
      timeout}},
      [{mc_client_binary,
      cmd_vocal_recv,
      5,
      [{file, "src/mc_client_binary.erl"}, {line, 151}]},
      {mc_client_binary,
      select_bucket,
      2,
      [{file, "src/mc_client_binary.erl"}, {line, 346}]},
      {ns_memcached,
      ensure_bucket,
      2,
      [{file, "src/ns_memcached.erl"}, {line, 1269}]},
      {ns_memcached,
      handle_info,
      2,
      [{file, "src/ns_memcached.erl"}, {line, 744}]},
      {gen_server,
      handle_msg,
      5,
      [{file, "gen_server.erl"}, {line, 604}]},
      {ns_memcached,
      init,
      1,
      [{file, "src/ns_memcached.erl"}, {line, 171}]},
      {gen_server,
      init_it,
      6,
      [{file, "gen_server.erl"}, {line, 304}]},
      {proc_lib,
      init_p_do_apply,
      3,
      [{file, "proc_lib.erl"}, {line, 239}]}]} (repeated 1 times) ns_memcached000 ns_1@172.23.105.207 10:44:42 - Sun Aug 24, 2014
      Control connection to memcached on 'ns_1@172.23.105.207' disconnected: {{badmatch,
      {error,
      timeout}},
      [{mc_client_binary,
      cmd_vocal_recv,
      5,
      [{file, "src/mc_client_binary.erl"}, {line, 151}]},
      {mc_client_binary,
      select_bucket,
      2,
      [{file, "src/mc_client_binary.erl"},{line, 346}]},
      {ns_memcached,
      ensure_bucket,
      2,
      [{file, "src/ns_memcached.erl"}, {line, 1269}]},
      {ns_memcached,
      handle_info,
      2,
      [{file, "src/ns_memcached.erl"}, {line, 744}]},
      {gen_server,
      handle_msg,
      5,
      [{file, "gen_server.erl"}, {line, 604}]},
      {ns_memcached,
      init,
      1,
      [{file, "src/ns_memcached.erl"}, {line, 171}]},
      {gen_server,
      init_it,
      6,
      [{file, "gen_server.erl"}, {line, 304}]},
      {proc_lib,
      init_p_do_apply,
      3,
      [{file, "proc_lib.erl"}, {line, 239}]}]} ns_memcached000 ns_1@172.23.105.207 10:43:56 - Sun Aug 24, 2014
      Rebalance exited with reason {not_all_nodes_are_ready_yet, ['ns_1@172.23.105.207']}

      trying to restart 172.23.105.207 node(firewall is turn off there):

      [root@centos-64-x64 logs]# /etc/init.d/couchbase-server status
      couchbase-server is running
      [root@centos-64-x64 logs]# /etc/init.d/couchbase-server restart
      Stopping couchbase-server
      ^C
      BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
      (v)ersion (k)ill (D)b-tables (d)istribution
      a
      [ OK ]
      Starting couchbase-server [ OK ]
      [root@centos-64-x64 logs]# /etc/init.d/couchbase-server status
      couchbase-server is running
      [root@centos-64-x64 logs]# /etc/init.d/couchbase-server stop
      Stopping couchbase-serverNOTE: shutdown failed

      {badrpc,nodedown}

      [FAILED]
      [root@centos-64-x64 logs]# /etc/init.d/couchbase-server start
      couchbase-server is already started [WARNING]

      cluster will be available a few hours

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            andreibaranouski Andrei Baranouski
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty