Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6579

swap Rebalance exited with reason {{bulk_set_vbucket_state_failed, couldnt_connect_to_memcached, with a list of crashes before: ebucketmigrator_srv,erlang:apply/2,ns_single_vbucket_mover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.0
    • 2.0-beta
    • ns_server
    • Security Level: Public
    • None

    Description

      2.0.0-1700-rel
      http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions/31/consoleFull
      /testrunner -i /tmp/rebalance_regression.ini get-logs=True -t swaprebalance.SwapRebalanceFailedTests.test_failed_swap_rebalance,replica=2,num-buckets=1,num-swap=2,swap-orchestrator=Fasle

      Alk, I do not see any erlang crash/core dumps, there are only logs:

      rebalance:info,2012-09-08T10:37:20.506,ns_1@10.3.121.92:<0.22478.1>:janitor_agent:set_vbucket_state:407]Doing vbucket 164 state change:

      {'ns_1@10.3.121.93',replica,undefined, 'ns_1@10.3.121.95'}

      [rebalance:info,2012-09-08T10:37:20.506,ns_1@10.3.121.92:<0.22479.1>:janitor_agent:set_vbucket_state:407]Doing vbucket 164 state change:

      {'ns_1@10.3.121.96',replica,undefined, 'ns_1@10.3.121.93'}

      [error_logger:error,2012-09-08T10:37:20.513,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_msg:76]** Generic server <19987.20007.0> terminating

        • Last message in was {tcp_closed,#Port<19987.25135>}
        • When Server state == {state,#Port<19987.25134>,#Port<19987.25132>,
          #Port<19987.25135>,#Port<19987.25133>,
          <19987.20008.0>,<<>>,<<>>,
          {set,1,16,16,8,80,48,
          {[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]},
          {{[],
          [561],
          [],[],[],[],[],[],[],[],[],[],[],[],[],
          []}}},
          115,false,false,0,
          {1347,125829,535727},
          not_started,undefined,
          <<"replication_building_561_'ns_1@10.3.121.96'">>,
          <19987.20007.0>,
          {had_backfill,true,undefined,[]}}
          ** Reason for termination ==
          ** downstream_closed

          [error_logger:error,2012-09-08T10:37:20.515,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_msg:76]** Generic server <19987.19993.0> terminating
          ** Last message in was {tcp_closed,#Port<19987.25127>}
          ** When Server state == {state,#Port<19987.25126>,#Port<19987.25117>,
          #Port<19987.25127>,#Port<19987.25118>,
          <19987.20006.0>,<<>>,<<>>,
          {set,1,16,16,8,80,48,
          {[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]}

          ,
          {{[],[],[],[],[],[],[],[],[],[],[],[],
          [857],
          [],[],[]}}},
          203,false,false,0,

          {1347,125829,399439}

          ,
          not_started,undefined,
          <<"replication_building_857_'ns_1@10.3.121.96'">>,
          <19987.19993.0>,

          Unknown macro: {had_backfill,true,undefined,[]}

          }

        • Reason for termination ==
        • downstream_closed

      [ns_server:info,2012-09-08T10:37:20.517,ns_1@10.3.121.92:<0.22444.1>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.94': [<<"replication_building_857_'ns_1@10.3.121.96'">>,
      <<"replication_building_857_'ns_1@10.3.121.93'">>,
      <<"replication_building_857_'ns_1@10.3.121.95'">>]
      [ns_server:info,2012-09-08T10:37:20.518,ns_1@10.3.121.92:<0.22460.1>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.93': [<<"replication_building_561_'ns_1@10.3.121.96'">>,
      <<"replication_building_561_'ns_1@10.3.121.95'">>]
      [error_logger:error,2012-09-08T10:37:20.518,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:init/1
      pid: <19987.20007.0>
      registered_name: []
      exception exit: downstream_closed
      in function gen_server:terminate/6
      ancestors: [<0.22460.1>,<0.22455.1>,<0.11756.1>,<0.11717.1>]
      messages: []
      links: [#Port<19987.25133>,<0.22460.1>,<19987.20008.0>,
      #Port<19987.25132>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 233
      stack_size: 24
      reductions: 128921
      neighbours:

      [ns_server:info,2012-09-08T10:37:20.518,ns_1@10.3.121.92:<0.22423.1>:ns_single_vbucket_mover:mover_inner_old_style:199]Got exit message (parent is <0.11756.1>). Exiting...
      {'EXIT',<0.22444.1>,{missing_checkpoint_stat,'ns_1@10.3.121.96',857}}
      [ns_server:error,2012-09-08T10:37:20.519,ns_1@10.3.121.92:<0.22423.1>:misc:sync_shutdown_many_i_am_trapping_exits:1373]Shutdown of the following failed: [{<0.22444.1>,
      {missing_checkpoint_stat,
      'ns_1@10.3.121.96',857}}]
      [ns_server:error,2012-09-08T10:37:20.519,ns_1@10.3.121.92:<0.22460.1>:misc:try_with_maybe_ignorant_after:1409]Eating exception from ignorant after-block:
      {error,{badmatch,[

      {<19987.20007.0>,downstream_closed}

      ]},
      [

      {misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2},
      {misc,try_with_maybe_ignorant_after,2},
      {ns_replicas_builder,build_replicas_main,6},
      {proc_lib,init_p_do_apply,3}]}
      [ns_server:error,2012-09-08T10:37:20.519,ns_1@10.3.121.92:<0.22423.1>:misc:try_with_maybe_ignorant_after:1409]Eating exception from ignorant after-block:
      {error,{badmatch,[{<0.22444.1>,
      {missing_checkpoint_stat,'ns_1@10.3.121.96',857}}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1}

      ,

      {misc,try_with_maybe_ignorant_after,2},
      {ns_single_vbucket_mover,mover,6},
      {proc_lib,init_p_do_apply,3}]}
      [error_logger:error,2012-09-08T10:37:20.519,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:init/1
      pid: <19987.19993.0>
      registered_name: []
      exception exit: downstream_closed
      in function gen_server:terminate/6
      ancestors: [<0.22444.1>,<0.22423.1>,<0.11756.1>,<0.11717.1>]
      messages: []
      links: [#Port<19987.25118>,<0.22444.1>,<19987.20006.0>,
      #Port<19987.25117>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 2584
      stack_size: 24
      reductions: 66423
      neighbours:

      [ns_server:info,2012-09-08T10:37:20.520,ns_1@10.3.121.92:<0.22455.1>:ns_single_vbucket_mover:mover_inner_old_style:199]Got exit message (parent is <0.11756.1>). Exiting...
      {'EXIT',<0.22460.1>,
      {replicator_died,{'EXIT',<19987.20007.0>,downstream_closed}}}
      [ns_server:error,2012-09-08T10:37:20.520,ns_1@10.3.121.92:<0.22455.1>:misc:sync_shutdown_many_i_am_trapping_exits:1373]Shutdown of the following failed: [{<0.22460.1>,
      {replicator_died,
      {'EXIT',<19987.20007.0>, downstream_closed}}}]
      [error_logger:error,2012-09-08T10:37:20.521,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: erlang:apply/2
      pid: <0.22444.1>
      registered_name: []
      exception exit: {missing_checkpoint_stat,'ns_1@10.3.121.96',857}
      in function ns_replicas_builder:'wait_checkpoint_opened/5-lc$^0/1-0'/2
      in call from ns_replicas_builder:wait_checkpoint_opened/5
      in call from ns_replicas_builder:'build_replicas_main/6-fun-1'/8
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_replicas_builder:build_replicas_main/6
      ancestors: [<0.22423.1>,<0.11756.1>,<0.11717.1>]
      messages: [{'EXIT',<19987.19993.0>,downstream_closed}]
      links: [<0.22423.1>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 514229
      stack_size: 24
      reductions: 57007
      neighbours:

      [ns_server:error,2012-09-08T10:37:20.521,ns_1@10.3.121.92:<0.22455.1>:misc:try_with_maybe_ignorant_after:1409]Eating exception from ignorant after-block:
      {error,
      {badmatch,
      [{<0.22460.1>,
      {replicator_died,{'EXIT',<19987.20007.0>,downstream_closed}}}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2}

      ,

      {ns_single_vbucket_mover,mover,6}

      ,

      {proc_lib,init_p_do_apply,3}]}
      [error_logger:error,2012-09-08T10:37:20.523,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: erlang:apply/2
      pid: <0.22460.1>
      registered_name: []
      exception exit: {replicator_died,
      {'EXIT',<19987.20007.0>,downstream_closed}}
      in function ns_replicas_builder:'build_replicas_main/6-fun-0'/2
      in call from ns_replicas_builder:observe_wait_all_done_old_style_loop/5
      in call from ns_replicas_builder:observe_wait_all_done/5
      in call from ns_replicas_builder:'build_replicas_main/6-fun-1'/8
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_replicas_builder:build_replicas_main/6
      ancestors: [<0.22455.1>,<0.11756.1>,<0.11717.1>]
      messages: []
      links: [<0.22455.1>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 196418
      stack_size: 24
      reductions: 47356
      neighbours:

      [error_logger:error,2012-09-08T10:37:20.524,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.22423.1>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.22444.1>,
      {missing_checkpoint_stat,'ns_1@10.3.121.96',857}}}
      in function ns_single_vbucket_mover:mover_inner_old_style/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.11756.1>,<0.11717.1>]
      messages: []
      links: [<0.11756.1>]
      dictionary: [{cleanup_list,[<0.22444.1>]}]
      trap_exit: true
      status: running
      heap_size: 2584
      stack_size: 24
      reductions: 5292
      neighbours:

      [error_logger:error,2012-09-08T10:37:20.525,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.22455.1>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.22460.1>,
      {replicator_died,
      {'EXIT',<19987.20007.0>,downstream_closed}}}}
      in function ns_single_vbucket_mover:mover_inner_old_style/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.11756.1>,<0.11717.1>]
      messages: []
      links: [<0.11756.1>]
      dictionary: [{cleanup_list,[<0.22460.1>]}]
      trap_exit: true
      status: running
      heap_size: 2584
      stack_size: 24
      reductions: 5342
      neighbours:

      [ns_server:info,2012-09-08T10:37:20.690,ns_1@10.3.121.92:ns_port_memcached:ns_port_server:log:169]memcached<0.403.0>: Sat Sep 8 17:37:20.486667 3: Schedule cleanup of "eq_tapq:rebalance_164"
      memcached<0.403.0>: Sat Sep 8 17:37:20.486798 3: TAP (Producer) eq_tapq:rebalance_164 - Clear the tap queues by force
      memcached<0.403.0>: Sat Sep 8 17:37:20.501995 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.95' - Connection is closed by force.
      memcached<0.403.0>: Sat Sep 8 17:37:20.502220 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.93' - Connection is closed by force.
      memcached<0.403.0>: Sat Sep 8 17:37:20.502404 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.96' - Connection is closed by force.

      [ns_server:info,2012-09-08T10:37:21.688,ns_1@10.3.121.92:ns_port_memcached:ns_port_server:log:169]memcached<0.403.0>: Sat Sep 8 17:37:21.487627 3: Schedule cleanup of "eq_tapq:anon_2546"
      memcached<0.403.0>: Sat Sep 8 17:37:21.487709 3: Schedule cleanup of "eq_tapq:anon_2545"
      memcached<0.403.0>: Sat Sep 8 17:37:21.487730 3: Schedule cleanup of "eq_tapq:anon_2547"
      memcached<0.403.0>: Sat Sep 8 17:37:21.487863 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.93' - Clear the tap queues by force
      memcached<0.403.0>: Sat Sep 8 17:37:21.487914 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.95' - Clear the tap queues by force
      memcached<0.403.0>: Sat Sep 8 17:37:21.487964 3: TAP (Producer) eq_tapq:replication_building_164_'ns_1@10.3.121.96' - Clear the tap queues by force

      [ns_server:debug,2012-09-08T10:37:24.730,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change: {node,'ns_1@10.3.121.96',membership} ->
      active
      [ns_server:debug,2012-09-08T10:37:24.730,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
      [ns_server:debug,2012-09-08T10:37:24.730,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change: {node,'ns_1@10.3.121.95',membership} ->
      active
      [ns_server:debug,2012-09-08T10:37:24.730,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change: {node,'ns_1@10.3.121.93',membership} ->
      active
      [ns_server:debug,2012-09-08T10:37:24.732,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
      [ns_server:debug,2012-09-08T10:37:24.733,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
      [ns_server:debug,2012-09-08T10:37:24.733,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
      [ns_server:debug,2012-09-08T10:37:24.734,ns_1@10.3.121.92:ns_config_rep:ns_config_rep:do_push_keys:317]Replicating some config keys ([{node,'ns_1@10.3.121.93',membership},
      {node,'ns_1@10.3.121.95',membership},
      {node,'ns_1@10.3.121.96',membership}]..)
      [user:info,2012-09-08T10:37:24.736,ns_1@10.3.121.92:<0.3725.0>:ns_orchestrator:rebalancing:513]Not rebalancing because rebalance is already in progress.

      [rebalance:debug,2012-09-08T10:37:25.514,ns_1@10.3.121.92:<0.11756.1>:janitor_agent:bulk_set_vbucket_state:386]bulk vbucket state change failed for:
      [{'ns_1@10.3.121.96',
      {'EXIT',
      {{couldnt_connect_to_memcached,
      {gen_server,call,
      ['ns_memcached-default',{set_vbucket,164,replica},60000]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.96'},
      {if_rebalance,<0.11756.1>,
      {update_vbucket_state,164,replica,undefined,
      'ns_1@10.3.121.93'}},
      infinity]}}}}]
      [ns_server:info,2012-09-08T10:37:25.517,ns_1@10.3.121.92:'janitor_agent-default':janitor_agent:handle_info:646]Undoing temporary vbucket states caused by rebalance
      [user:info,2012-09-08T10:37:25.517,ns_1@10.3.121.92:<0.3725.0>:ns_orchestrator:handle_info:295]Rebalance exited with reason {{bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.96',
      {'EXIT',
      {{couldnt_connect_to_memcached,
      {gen_server,call,
      ['ns_memcached-default',
      {set_vbucket,164,replica},
      60000]}},
      {gen_server,call,
      [{'janitor_agent-default', 'ns_1@10.3.121.96'},
      {if_rebalance,<0.11756.1>,
      {update_vbucket_state,164,replica,
      undefined,'ns_1@10.3.121.93'}},
      infinity]}}}}]},
      [{janitor_agent,bulk_set_vbucket_state,4},
      {ns_vbucket_mover, update_replication_post_move,3},
      {ns_vbucket_mover,handle_info,2},
      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}

      ]}

      [ns_server:debug,2012-09-08T10:37:25.517,ns_1@10.3.121.92:<0.11763.1>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

      {ns_node_disco_events,<0.11756.1>} exited with reason {{bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.96',
      {'EXIT',
      {{couldnt_connect_to_memcached,
      {gen_server,
      call,
      ['ns_memcached-default',
      {set_vbucket, 164, replica},
      60000]}},
      {gen_server,
      call,
      [{'janitor_agent-default', 'ns_1@10.3.121.96'},
      {if_rebalance,
      <0.11756.1>,
      {update_vbucket_state,
      164,
      replica,
      undefined,
      'ns_1@10.3.121.93'}},
      infinity]}}}}]},
      [{janitor_agent, bulk_set_vbucket_state, 4},
      {ns_vbucket_mover, update_replication_post_move, 3},
      {ns_vbucket_mover, handle_info, 2},
      {gen_server, handle_msg, 5},
      {proc_lib, init_p_do_apply, 3}]}
      [ns_server:debug,2012-09-08T10:37:25.521,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
      [ns_server:debug,2012-09-08T10:37:25.521,ns_1@10.3.121.92:<0.11763.1>:ns_pubsub:do_subscribe_link:149]Deleting {ns_node_disco_events,<0.11756.1>}

      event handler: ok
      [ns_server:info,2012-09-08T10:37:25.526,ns_1@10.3.121.92:<0.22853.1>:diag_handler:log_all_tap_and_checkpoint_stats:126]logging tap & checkpoint stats
      [ns_server:error,2012-09-08T10:37:25.530,ns_1@10.3.121.92:ns_doctor:ns_doctor:update_status:204]The following buckets became not ready on node 'ns_1@10.3.121.96': ["default"], those of them are active ["default"]
      [ns_server:debug,2012-09-08T10:37:25.538,ns_1@10.3.121.92:ns_config_rep:ns_config_rep:do_push_keys:317]Replicating some config keys ([counters,rebalance_status,rebalancer_pid]..)
      [ns_server:debug,2012-09-08T10:37:25.553,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change:
      counters ->
      [

      {rebalance_fail,2}

      ,

      {rebalance_start,3}

      ,

      {rebalance_success,1}

      ]
      [ns_server:debug,2012-09-08T10:37:25.553,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
      [ns_server:debug,2012-09-08T10:37:25.554,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
      [ns_server:debug,2012-09-08T10:37:25.568,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change:
      rebalancer_pid ->
      undefined
      [ns_server:debug,2012-09-08T10:37:25.569,ns_1@10.3.121.92:'capi_set_view_manager-default':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
      [ns_server:debug,2012-09-08T10:37:25.604,ns_1@10.3.121.92:ns_config_log:ns_config_log:log_common:111]config change:
      rebalance_status ->

      {none,<<"Rebalance failed. See logs for detailed reason. You can try rebalance again.">>}

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            andreibaranouski Andrei Baranouski
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty