Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7246 [windows] Commit failure and retry caused memcached to exit with 255, which in turn caused rebalance failure
  3. MB-7201

[windows]Control connection to memcached to one of the node during rebalance is lost , rebalance failed

    XMLWordPrintable

Details

    • Technical task
    • Resolution: Fixed
    • Major
    • 2.0.1
    • 2.0-beta-2, 2.0
    • ns_server
    • Security Level: Public
    • None

    Description

      rebalance in 5->7 node is failed:

      Control connection to memcached on 'ns_1@10.3.2.141' disconnected: {badmatch,{error,closed}}

      ....

      [ns_server:info,2012-11-14T20:44:36.251,ns_1@10.3.2.131:ns_port_memcached<0.437.0>:ns_port_server:log:171]memcached<0.437.0>: Wed Nov 14 20:44:35.989437 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_676_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 676
      memcached<0.437.0>: Wed Nov 14 20:44:36.083187 Pacific Standard Time 3: Notified the completion of checkpoint persistence for vbucket 404, cookie 0000000005A0EB00

      [rebalance:info,2012-11-14T20:44:36.345,ns_1@10.3.2.131:<0.7540.93>:ebucketmigrator_srv:init:551]Starting tap stream:
      [

      {vbuckets,[953]},
      {checkpoints,[{953,3}]},
      {name,<<"replication_building_953_'ns_1@10.3.2.131'">>},
      {takeover,false}]
      {{"10.3.2.130",11209},
      {"10.3.2.131",11209},
      [{vbuckets,[953]}

      ,

      {takeover,false}

      ,

      {suffix,"building_953_'ns_1@10.3.2.131'"}

      ,

      {username,"default"}

      ,

      {password,[]}

      ]}

      [error_logger:error,2012-11-14T20:44:37.876,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.7325.93>
      registered_name: []
      exception exit: {unexpected_exit,
      {'EXIT',<0.7456.93>,
      {{wait_checkpoint_persisted_failed,"default",404,4,
      [{'ns_1@10.3.2.141',
      {'EXIT',
      {badmatch,{error,closed,
      {gen_server,call,
      ['ns_memcached-default',

      {wait_for_checkpoint_persistence,404,4}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.3.2.141'}

      ,
      {if_rebalance,<0.27487.91>,
      {wait_checkpoint_persisted,404,4}},
      infinity]}}}}]},
      [

      {ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}

      ]}}}
      in function ns_single_vbucket_mover:spawn_and_wait/1
      in call from ns_single_vbucket_mover:mover_inner/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.27487.91>,<0.27357.91>]
      messages: [{'EXIT',<0.27487.91>,{mover_failed,downstream_closed}}]
      links: [<0.27487.91>]
      dictionary: [

      {cleanup_list,[<0.7410.93>,<0.7456.93>]}

      ]
      trap_exit: true
      status: running
      heap_size: 6765
      stack_size: 24
      reductions: 12605
      neighbours:

      ...

      [error_logger:error,2012-11-14T20:44:37.938,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:confirm_sent_messages/1-fun-0/0
      pid: <20523.8371.14>
      registered_name: []
      exception error: no match of right hand side value

      {error,closed}

      in function ebucketmigrator_srv:'confirm_sent_messages/1-fun-0'/3
      ancestors: [<20523.8294.14>,<0.7410.93>,<0.7325.93>,<0.27487.91>,
      <0.27357.91>]
      messages: []
      links: [<20523.8294.14>]
      dictionary: []
      trap_exit: false
      status: running
      heap_size: 987
      stack_size: 24
      reductions: 748
      neighbours:

      [error_logger:error,2012-11-14T20:44:37.954,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:init/1
      pid: <20523.8294.14>
      registered_name: []
      exception exit: downstream_closed
      in function gen_server:terminate/6
      ancestors: [<0.7410.93>,<0.7325.93>,<0.27487.91>,<0.27357.91>]
      messages: [

      {'EXIT',<20523.8296.14>,killed}

      ]
      links: [#Port<20523.253270>,<20523.8371.14>,<0.7410.93>,
      #Port<20523.253269>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 1597
      stack_size: 24
      reductions: 84633
      neighbours:

      ...

      [ns_server:error,2012-11-14T20:44:42.720,ns_1@10.3.2.131:<0.7424.93>:misc:inner_wait_shutdown:1426]Expected exit signal from <0.7426.93> but could not get it in 5 seconds. This is a bug, but process we're waiting for is dead (noproc), so trying to ignore...
      [ns_server:error,2012-11-14T20:44:42.735,ns_1@10.3.2.131:<0.7424.93>:misc:sync_shutdown_many_i_am_trapping_exits:1408]Shutdown of the following failed: [

      {<20200.30113.21>,normal},
      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}]
      [ns_server:info,2012-11-14T20:44:42.782,ns_1@10.3.2.131:<0.7424.93>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.2.141': [<<"replication_building_128_'ns_1@10.3.2.139'">>,
      <<"replication_building_128_'ns_1@10.3.2.131'">>,
      <<"replication_building_128_'ns_1@10.3.2.130'">>,
      <<"replication_building_128_'ns_1@10.3.2.132'">>]
      [error_logger:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.7424.93> terminating
      ** Last message in was {'EXIT',<0.7426.93>,normal}
      ** When Server state == {state,"default",128,'ns_1@10.3.2.141',
      [{'ns_1@10.3.2.139',<20200.30113.21>},
      {'ns_1@10.3.2.131',<0.7426.93>},
      {'ns_1@10.3.2.130',<20199.10253.29>},
      {'ns_1@10.3.2.132',<20581.12222.6>}]}
      ** Reason for termination ==
      ** {{badmatch,[{<20200.30113.21>,normal}

      ,

      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2},
      {gen_server,terminate,6},
      {proc_lib,init_p_do_apply,3}]}

      [ns_server:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:<0.7368.93>:misc:sync_shutdown_many_i_am_trapping_exits:1408]Shutdown of the following failed: [{<0.7424.93>,
      {{badmatch,
      [{<20200.30113.21>,normal},
      {<0.7426.93>,noproc}

      ,

      {<20581.12222.6>,normal}]},
      [{misc, sync_shutdown_many_i_am_trapping_exits, 1},
      {misc,try_with_maybe_ignorant_after,2},
      {gen_server,terminate,6},
      {proc_lib,init_p_do_apply,3}]}},
      {<0.7429.93>,
      {{badmatch,
      [{'EXIT',
      {normal,
      {gen_server,call, [<0.7426.93>,had_backfill,30000]}}},
      {'EXIT',
      {shutdown,
      {gen_server,call, [<20199.10253.29>,had_backfill, 30000]}}}]},
      [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-', 1}]}}]
      [ns_server:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:<0.7368.93>:misc:try_with_maybe_ignorant_after:1444]Eating exception from ignorant after-block:
      {error,
      {badmatch,
      [{<0.7424.93>,
      {{badmatch,
      [{<20200.30113.21>,normal},
      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}

      ]},
      [

      {misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2},
      {gen_server,terminate,6},
      {proc_lib,init_p_do_apply,3}]}},
      {<0.7429.93>,
      {{badmatch,
      [{'EXIT',
      {normal,
      {gen_server,call,[<0.7426.93>,had_backfill,30000]}}},
      {'EXIT',
      {shutdown,
      {gen_server,call, [<20199.10253.29>,had_backfill,30000]}}}]},
      [{ns_single_vbucket_mover,'-wait_backfill_determination/1-fun-1-', 1}]}}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1}

      ,

      {misc,try_with_maybe_ignorant_after,2},
      {ns_single_vbucket_mover,mover,6},
      {proc_lib,init_p_do_apply,3}]}
      [error_logger:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: new_ns_replicas_builder:init/1
      pid: <0.7424.93>
      registered_name: []
      exception exit: {{badmatch,[{<20200.30113.21>,normal},
      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2}

      ,

      {gen_server,terminate,6}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}
      in function gen_server:terminate/6
      ancestors: [<0.7368.93>,<0.27487.91>,<0.27357.91>]
      messages: [

      {'EXIT',<0.7368.93>,shutdown}

      ]
      links: [<0.7368.93>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 514229
      stack_size: 24
      reductions: 66044
      neighbours:

      [user:info,2012-11-14T20:44:42.798,ns_1@10.3.2.131:<0.385.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason

      {mover_failed,downstream_closed}

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            chiyoung Chiyoung Seo (Inactive)
            iryna iryna
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty