Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5352

Rebalance failed due to shutdown gen_server call (Got error while trying to read close ack:{error,closed})

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Won't Fix
    • Affects Version/s: 1.8.1-release-candidate
    • Fix Version/s: 1.8.1
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:
      Centos 64 bit
      181-832-rel

      Description

      Failing test case:-
      swaprebalance.SwapRebalanceFailedTests.test_add_back_failed_node,replica=1,num-buckets=1,num-swap=3

      [user:warn] [2012-05-21 18:30:02] [ns_1@10.1.3.74:ns_node_disco:ns_node_disco:handle_info:150] Node 'ns_1@10.1.3.74' saw that node 'ns_1@10.1.3.77' went down.
      [ns_server:info] [2012-05-21 18:30:02] [ns_1@10.1.3.74:ns_node_disco_events:ns_node_disco_log:handle_event:46] ns_node_disco_log: nodes changed: ['ns_1@10.1.3.74','ns_1@10.1.3.76',
      'ns_1@10.1.3.79','ns_1@10.1.3.80']
      [ns_server:warn] [2012-05-21 18:30:02] [ns_1@10.1.3.74:mb_master:mb_master:master:399] Master got candidate heartbeat from node 'ns_1@10.1.3.75' which is not in peers ['ns_1@10.1.3.74',
      'ns_1@10.1.3.76',
      'ns_1@10.1.3.79',
      'ns_1@10.1.3.80']
      [rebalance:warn] [2012-05-21 18:30:03] [ns_1@10.1.3.74:<0.975.0>:ebucketmigrator_srv:do_confirm_sent_messages:321] Got error while trying to read close ack:

      {error,closed}

      [ns_server:info] [2012-05-21 18:30:03] [ns_1@10.1.3.74:<0.1440.0>:ns_vbm_sup:kill_child:214] Stopped replicator:

      {child_id,[0,1],'ns_1@10.1.3.75'}

      on

      {'ns_1@10.1.3.74', "default"}

      [user:info] [2012-05-21 18:30:03] [ns_1@10.1.3.74:<0.217.0>:ns_orchestrator:handle_info:245] Rebalance exited with reason {shutdown,
      {gen_server,call,
      [

      {'ns_vbm_sup-default','ns_1@10.1.3.76'}

      ,
      which_children,infinity]}}

      [ns_server:info] [2012-05-21 18:30:03] [ns_1@10.1.3.74:<0.1688.0>:diag_handler:log_all_tap_and_checkpoint_stats:123] logging tap & checkpoint stats
      [ns_server:debug] [2012-05-21 18:30:03] [ns_1@10.1.3.74:ns_config_log:ns_config_log:log_common:111] config change:
      counters ->
      [

      {rebalance_fail,1}

      ,

      {rebalance_start,2}

      ,

      {failover_node,3}

      ,

      {rebalance_success,1}

      ]

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        *) after failing over .77 we failed running janitor. So .76 still had replication into .77 configured

        *) during rebalance .76's replication supervisor finally died due to max_restart_intensity caused by it's inability to replicate into no more existing bucket on .77

        *) right at that time we were asking it for it's child's to do replication changes thus rebalance failed

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - *) after failing over .77 we failed running janitor. So .76 still had replication into .77 configured *) during rebalance .76's replication supervisor finally died due to max_restart_intensity caused by it's inability to replicate into no more existing bucket on .77 *) right at that time we were asking it for it's child's to do replication changes thus rebalance failed
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        note. Restarting rebalance would very likely succeed

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - note. Restarting rebalance would very likely succeed
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        I'd like to proceed with workaround mentioned. Let me know if it's not working

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - I'd like to proceed with workaround mentioned. Let me know if it's not working
        Hide
        karan Karan Kumar (Inactive) added a comment -

        Workaround is to re-issue the rebalance

        Show
        karan Karan Kumar (Inactive) added a comment - Workaround is to re-issue the rebalance

          People

          • Assignee:
            karan Karan Kumar (Inactive)
            Reporter:
            karan Karan Kumar (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes