Details

    • Type: Technical task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0-beta-2, 2.0
    • Fix Version/s: 2.0.1
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:

      Description

      rebalance in 5->7 node is failed:

      Control connection to memcached on 'ns_1@10.3.2.141' disconnected: {badmatch,{error,closed}}

      ....

      [ns_server:info,2012-11-14T20:44:36.251,ns_1@10.3.2.131:ns_port_memcached<0.437.0>:ns_port_server:log:171]memcached<0.437.0>: Wed Nov 14 20:44:35.989437 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_676_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 676
      memcached<0.437.0>: Wed Nov 14 20:44:36.083187 Pacific Standard Time 3: Notified the completion of checkpoint persistence for vbucket 404, cookie 0000000005A0EB00

      [rebalance:info,2012-11-14T20:44:36.345,ns_1@10.3.2.131:<0.7540.93>:ebucketmigrator_srv:init:551]Starting tap stream:
      [

      {vbuckets,[953]},
      {checkpoints,[{953,3}]},
      {name,<<"replication_building_953_'ns_1@10.3.2.131'">>},
      {takeover,false}]
      {{"10.3.2.130",11209},
      {"10.3.2.131",11209},
      [{vbuckets,[953]}

      ,

      {takeover,false}

      ,

      {suffix,"building_953_'ns_1@10.3.2.131'"}

      ,

      {username,"default"}

      ,

      {password,[]}

      ]}

      [error_logger:error,2012-11-14T20:44:37.876,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.7325.93>
      registered_name: []
      exception exit: {unexpected_exit,
      {'EXIT',<0.7456.93>,
      {{wait_checkpoint_persisted_failed,"default",404,4,
      [{'ns_1@10.3.2.141',
      {'EXIT',
      {badmatch,{error,closed,
      {gen_server,call,
      ['ns_memcached-default',

      {wait_for_checkpoint_persistence,404,4}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.3.2.141'}

      ,
      {if_rebalance,<0.27487.91>,
      {wait_checkpoint_persisted,404,4}},
      infinity]}}}}]},
      [

      {ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}

      ]}}}
      in function ns_single_vbucket_mover:spawn_and_wait/1
      in call from ns_single_vbucket_mover:mover_inner/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.27487.91>,<0.27357.91>]
      messages: [{'EXIT',<0.27487.91>,{mover_failed,downstream_closed}}]
      links: [<0.27487.91>]
      dictionary: [

      {cleanup_list,[<0.7410.93>,<0.7456.93>]}

      ]
      trap_exit: true
      status: running
      heap_size: 6765
      stack_size: 24
      reductions: 12605
      neighbours:

      ...

      [error_logger:error,2012-11-14T20:44:37.938,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:confirm_sent_messages/1-fun-0/0
      pid: <20523.8371.14>
      registered_name: []
      exception error: no match of right hand side value

      {error,closed}

      in function ebucketmigrator_srv:'confirm_sent_messages/1-fun-0'/3
      ancestors: [<20523.8294.14>,<0.7410.93>,<0.7325.93>,<0.27487.91>,
      <0.27357.91>]
      messages: []
      links: [<20523.8294.14>]
      dictionary: []
      trap_exit: false
      status: running
      heap_size: 987
      stack_size: 24
      reductions: 748
      neighbours:

      [error_logger:error,2012-11-14T20:44:37.954,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:init/1
      pid: <20523.8294.14>
      registered_name: []
      exception exit: downstream_closed
      in function gen_server:terminate/6
      ancestors: [<0.7410.93>,<0.7325.93>,<0.27487.91>,<0.27357.91>]
      messages: [

      {'EXIT',<20523.8296.14>,killed}

      ]
      links: [#Port<20523.253270>,<20523.8371.14>,<0.7410.93>,
      #Port<20523.253269>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 1597
      stack_size: 24
      reductions: 84633
      neighbours:

      ...

      [ns_server:error,2012-11-14T20:44:42.720,ns_1@10.3.2.131:<0.7424.93>:misc:inner_wait_shutdown:1426]Expected exit signal from <0.7426.93> but could not get it in 5 seconds. This is a bug, but process we're waiting for is dead (noproc), so trying to ignore...
      [ns_server:error,2012-11-14T20:44:42.735,ns_1@10.3.2.131:<0.7424.93>:misc:sync_shutdown_many_i_am_trapping_exits:1408]Shutdown of the following failed: [

      {<20200.30113.21>,normal},
      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}]
      [ns_server:info,2012-11-14T20:44:42.782,ns_1@10.3.2.131:<0.7424.93>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.2.141': [<<"replication_building_128_'ns_1@10.3.2.139'">>,
      <<"replication_building_128_'ns_1@10.3.2.131'">>,
      <<"replication_building_128_'ns_1@10.3.2.130'">>,
      <<"replication_building_128_'ns_1@10.3.2.132'">>]
      [error_logger:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.7424.93> terminating
      ** Last message in was {'EXIT',<0.7426.93>,normal}
      ** When Server state == {state,"default",128,'ns_1@10.3.2.141',
      [{'ns_1@10.3.2.139',<20200.30113.21>},
      {'ns_1@10.3.2.131',<0.7426.93>},
      {'ns_1@10.3.2.130',<20199.10253.29>},
      {'ns_1@10.3.2.132',<20581.12222.6>}]}
      ** Reason for termination ==
      ** {{badmatch,[{<20200.30113.21>,normal}

      ,

      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2},
      {gen_server,terminate,6},
      {proc_lib,init_p_do_apply,3}]}

      [ns_server:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:<0.7368.93>:misc:sync_shutdown_many_i_am_trapping_exits:1408]Shutdown of the following failed: [{<0.7424.93>,
      {{badmatch,
      [{<20200.30113.21>,normal},
      {<0.7426.93>,noproc}

      ,

      {<20581.12222.6>,normal}]},
      [{misc, sync_shutdown_many_i_am_trapping_exits, 1},
      {misc,try_with_maybe_ignorant_after,2},
      {gen_server,terminate,6},
      {proc_lib,init_p_do_apply,3}]}},
      {<0.7429.93>,
      {{badmatch,
      [{'EXIT',
      {normal,
      {gen_server,call, [<0.7426.93>,had_backfill,30000]}}},
      {'EXIT',
      {shutdown,
      {gen_server,call, [<20199.10253.29>,had_backfill, 30000]}}}]},
      [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-', 1}]}}]
      [ns_server:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:<0.7368.93>:misc:try_with_maybe_ignorant_after:1444]Eating exception from ignorant after-block:
      {error,
      {badmatch,
      [{<0.7424.93>,
      {{badmatch,
      [{<20200.30113.21>,normal},
      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}

      ]},
      [

      {misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2},
      {gen_server,terminate,6},
      {proc_lib,init_p_do_apply,3}]}},
      {<0.7429.93>,
      {{badmatch,
      [{'EXIT',
      {normal,
      {gen_server,call,[<0.7426.93>,had_backfill,30000]}}},
      {'EXIT',
      {shutdown,
      {gen_server,call, [<20199.10253.29>,had_backfill,30000]}}}]},
      [{ns_single_vbucket_mover,'-wait_backfill_determination/1-fun-1-', 1}]}}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1}

      ,

      {misc,try_with_maybe_ignorant_after,2},
      {ns_single_vbucket_mover,mover,6},
      {proc_lib,init_p_do_apply,3}]}
      [error_logger:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: new_ns_replicas_builder:init/1
      pid: <0.7424.93>
      registered_name: []
      exception exit: {{badmatch,[{<20200.30113.21>,normal},
      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2}

      ,

      {gen_server,terminate,6}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}
      in function gen_server:terminate/6
      ancestors: [<0.7368.93>,<0.27487.91>,<0.27357.91>]
      messages: [

      {'EXIT',<0.7368.93>,shutdown}

      ]
      links: [<0.7368.93>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 514229
      stack_size: 24
      reductions: 66044
      neighbours:

      [user:info,2012-11-14T20:44:42.798,ns_1@10.3.2.131:<0.385.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason

      {mover_failed,downstream_closed}

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Show
          iryna iryna added a comment - Diags: https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-130.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-131.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-132.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-137.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-138.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-139.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-141.txt.gz collect_info: https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-130.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-131.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-132.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-137.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-138.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-139.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-141.zip
          Hide
          dipti Dipti Borkar (Inactive) added a comment -

          Chiyoung, can you take a look as to why memcached crashed?

          Show
          dipti Dipti Borkar (Inactive) added a comment - Chiyoung, can you take a look as to why memcached crashed?
          Hide
          alkondratenko Aleksey Kondratenko (Inactive) added a comment -

          Not sure why people think memcached crashed. Maybe yes, but from log messages above it could be just one of those weird disconnects for no apparent reason

          Show
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - Not sure why people think memcached crashed. Maybe yes, but from log messages above it could be just one of those weird disconnects for no apparent reason
          Hide
          steve Steve Yen added a comment -

          per bug-scrub: to 2.0.1 as memcached didn't crash.

          Show
          steve Steve Yen added a comment - per bug-scrub: to 2.0.1 as memcached didn't crash.
          Hide
          chiyoung Chiyoung Seo (Inactive) added a comment -

          memcached process on 10.3.2.141 was suddenly restarted:

          C:\Program Files\Couchbase\Server\var\lib\couchbase\logs>type memcached.log.0.txt
          ...
          Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: Notified the completion of checkpoint persistence for vbucket 403, cookie 00000000059A9340
          Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.139' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
          Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.139' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
          Wed Nov 14 20:44:24.623898 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
          Wed Nov 14 20:44:24.623898 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
          Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Schedule the backfill for vbucket 127
          Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
          Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
          Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 127
          Wed Nov 14 20:44:24.655148 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Backfill is completed with VBuckets 127,
          Wed Nov 14 20:44:24.655148 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 127
          Wed Nov 14 20:44:24.905148 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_1847 - disconnected

          C:\Program Files\Couchbase\Server\var\lib\couchbase\logs>type memcached.log.1.txt

          Wed Nov 14 20:44:32.577023 Pacific Standard Time 3: Trying to connect to mccouch: "localhost:11213"
          Wed Nov 14 20:44:33.608273 Pacific Standard Time 3: Connected to mccouch: "localhost:11213"
          Wed Nov 14 20:44:33.748898 Pacific Standard Time 3: Extension support isn't implemented in this version of bucket_engine
          Wed Nov 14 20:44:33.842648 Pacific Standard Time 3: Failed to load mutation log, falling back to key dump
          Wed Nov 14 20:44:34.623898 Pacific Standard Time 3: metadata loaded in 1012 ms
          Wed Nov 14 20:44:35.592648 Pacific Standard Time 3: warmup completed in 1993 ms
          Wed Nov 14 20:44:39.686398 Pacific Standard Time 3: Deletion of vbucket 675 was completed.
          Wed Nov 14 20:44:40.170773 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 63
          Wed Nov 14 20:44:40.170773 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 64
          Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 192
          Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 193
          Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 194
          Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 391
          ...

          From the above log snippets, we can see that the memcached process on 10.3.2.141 was suddenly restarted between 20:44:24 and 20:44:32, but unfortunately, there were NO error / warning / fatal logs from memcached and ep-engine.

          Iryna,

          Can you increase the memcached log level to INFO?

          I know this would increase the logging overhead a lot, but we really need to understand what happened in memcached / ep-engine layer.

          Show
          chiyoung Chiyoung Seo (Inactive) added a comment - memcached process on 10.3.2.141 was suddenly restarted: C:\Program Files\Couchbase\Server\var\lib\couchbase\logs>type memcached.log.0.txt ... Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: Notified the completion of checkpoint persistence for vbucket 403, cookie 00000000059A9340 Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.139' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0 Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.139' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0 Wed Nov 14 20:44:24.623898 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0 Wed Nov 14 20:44:24.623898 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0 Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Schedule the backfill for vbucket 127 Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0 Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0 Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 127 Wed Nov 14 20:44:24.655148 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Backfill is completed with VBuckets 127, Wed Nov 14 20:44:24.655148 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 127 Wed Nov 14 20:44:24.905148 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_1847 - disconnected C:\Program Files\Couchbase\Server\var\lib\couchbase\logs>type memcached.log.1.txt Wed Nov 14 20:44:32.577023 Pacific Standard Time 3: Trying to connect to mccouch: "localhost:11213" Wed Nov 14 20:44:33.608273 Pacific Standard Time 3: Connected to mccouch: "localhost:11213" Wed Nov 14 20:44:33.748898 Pacific Standard Time 3: Extension support isn't implemented in this version of bucket_engine Wed Nov 14 20:44:33.842648 Pacific Standard Time 3: Failed to load mutation log, falling back to key dump Wed Nov 14 20:44:34.623898 Pacific Standard Time 3: metadata loaded in 1012 ms Wed Nov 14 20:44:35.592648 Pacific Standard Time 3: warmup completed in 1993 ms Wed Nov 14 20:44:39.686398 Pacific Standard Time 3: Deletion of vbucket 675 was completed. Wed Nov 14 20:44:40.170773 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 63 Wed Nov 14 20:44:40.170773 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 64 Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 192 Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 193 Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 194 Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 391 ... From the above log snippets, we can see that the memcached process on 10.3.2.141 was suddenly restarted between 20:44:24 and 20:44:32, but unfortunately, there were NO error / warning / fatal logs from memcached and ep-engine. Iryna, Can you increase the memcached log level to INFO? I know this would increase the logging overhead a lot, but we really need to understand what happened in memcached / ep-engine layer.
          Hide
          chiyoung Chiyoung Seo (Inactive) added a comment -

          memcached exited with unknown status 255 after deleting vbucket 657 in its second attempt successfully:

          [ns_server:info,2012-11-14T20:44:29.931,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:log:171]memcached<0.28126.12>: Wed Nov 14 20:44:29.733273 Pacific Standard Time 3: Deletion of vbucket 675 failed because the vbucket is not in a dead state
          memcached<0.28126.12>: Wed Nov 14 20:44:29.733273 Pacific Standard Time 3: Deletion of vbucket 675 was completed.

          [rebalance:info,2012-11-14T20:44:30.181,ns_1@10.3.2.141:<0.8251.14>:ebucketmigrator_srv:do_confirm_sent_messages:684]Got close ack!

          [ns_server:info,2012-11-14T20:44:30.384,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:log:171]memcached<0.28126.12>: Wed Nov 14 20:44:30.186398 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_1852 - disconnected

          [ns_server:info,2012-11-14T20:44:32.416,ns_1@10.3.2.141:<0.28199.12>:mc_connection:run_loop:202]mccouch connection was normally closed
          [rebalance:warn,2012-11-14T20:44:32.416,ns_1@10.3.2.141:<0.7551.14>:ebucketmigrator_srv:do_confirm_sent_messages:691]Got error while trying to read close ack:

          {error,einval}

          [user:info,2012-11-14T20:44:32.416,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:handle_info:107]Port server memcached on node 'ns_1@10.3.2.141' exited with status 255. Restarting.

          It seems to me that this issue is the duplicate of MB-7246. Basically, memcached occasionally exits with 255 unknown status code when the ep-engine hits the flusher or vbucket deletion related warnings.

          Show
          chiyoung Chiyoung Seo (Inactive) added a comment - memcached exited with unknown status 255 after deleting vbucket 657 in its second attempt successfully: [ns_server:info,2012-11-14T20:44:29.931,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:log:171] memcached<0.28126.12>: Wed Nov 14 20:44:29.733273 Pacific Standard Time 3: Deletion of vbucket 675 failed because the vbucket is not in a dead state memcached<0.28126.12>: Wed Nov 14 20:44:29.733273 Pacific Standard Time 3: Deletion of vbucket 675 was completed. [rebalance:info,2012-11-14T20:44:30.181,ns_1@10.3.2.141:<0.8251.14>:ebucketmigrator_srv:do_confirm_sent_messages:684] Got close ack! [ns_server:info,2012-11-14T20:44:30.384,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:log:171] memcached<0.28126.12>: Wed Nov 14 20:44:30.186398 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_1852 - disconnected [ns_server:info,2012-11-14T20:44:32.416,ns_1@10.3.2.141:<0.28199.12>:mc_connection:run_loop:202] mccouch connection was normally closed [rebalance:warn,2012-11-14T20:44:32.416,ns_1@10.3.2.141:<0.7551.14>:ebucketmigrator_srv:do_confirm_sent_messages:691] Got error while trying to read close ack: {error,einval} [user:info,2012-11-14T20:44:32.416,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:handle_info:107] Port server memcached on node 'ns_1@10.3.2.141' exited with status 255. Restarting. It seems to me that this issue is the duplicate of MB-7246 . Basically, memcached occasionally exits with 255 unknown status code when the ep-engine hits the flusher or vbucket deletion related warnings.
          Hide
          chiyoung Chiyoung Seo (Inactive) added a comment -

          We didn't see this issue anymore after installing the service pack 1 on windows 2008

          Show
          chiyoung Chiyoung Seo (Inactive) added a comment - We didn't see this issue anymore after installing the service pack 1 on windows 2008

            People

            • Assignee:
              chiyoung Chiyoung Seo (Inactive)
              Reporter:
              iryna iryna
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes