Details

    • Type: Technical task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0-beta-2, 2.0
    • Fix Version/s: 2.0.1
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:

      Description

      rebalance in 5->7 node is failed:

      Control connection to memcached on 'ns_1@10.3.2.141' disconnected: {badmatch,{error,closed}}

      ....

      [ns_server:info,2012-11-14T20:44:36.251,ns_1@10.3.2.131:ns_port_memcached<0.437.0>:ns_port_server:log:171]memcached<0.437.0>: Wed Nov 14 20:44:35.989437 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_676_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 676
      memcached<0.437.0>: Wed Nov 14 20:44:36.083187 Pacific Standard Time 3: Notified the completion of checkpoint persistence for vbucket 404, cookie 0000000005A0EB00

      [rebalance:info,2012-11-14T20:44:36.345,ns_1@10.3.2.131:<0.7540.93>:ebucketmigrator_srv:init:551]Starting tap stream:
      [

      {vbuckets,[953]},
      {checkpoints,[{953,3}]},
      {name,<<"replication_building_953_'ns_1@10.3.2.131'">>},
      {takeover,false}]
      {{"10.3.2.130",11209},
      {"10.3.2.131",11209},
      [{vbuckets,[953]}

      ,

      {takeover,false}

      ,

      {suffix,"building_953_'ns_1@10.3.2.131'"}

      ,

      {username,"default"}

      ,

      {password,[]}

      ]}

      [error_logger:error,2012-11-14T20:44:37.876,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.7325.93>
      registered_name: []
      exception exit: {unexpected_exit,
      {'EXIT',<0.7456.93>,
      {{wait_checkpoint_persisted_failed,"default",404,4,
      [{'ns_1@10.3.2.141',
      {'EXIT',
      {badmatch,{error,closed,
      {gen_server,call,
      ['ns_memcached-default',

      {wait_for_checkpoint_persistence,404,4}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.3.2.141'}

      ,
      {if_rebalance,<0.27487.91>,
      {wait_checkpoint_persisted,404,4}},
      infinity]}}}}]},
      [

      {ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}

      ]}}}
      in function ns_single_vbucket_mover:spawn_and_wait/1
      in call from ns_single_vbucket_mover:mover_inner/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.27487.91>,<0.27357.91>]
      messages: [{'EXIT',<0.27487.91>,{mover_failed,downstream_closed}}]
      links: [<0.27487.91>]
      dictionary: [

      {cleanup_list,[<0.7410.93>,<0.7456.93>]}

      ]
      trap_exit: true
      status: running
      heap_size: 6765
      stack_size: 24
      reductions: 12605
      neighbours:

      ...

      [error_logger:error,2012-11-14T20:44:37.938,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:confirm_sent_messages/1-fun-0/0
      pid: <20523.8371.14>
      registered_name: []
      exception error: no match of right hand side value

      {error,closed}

      in function ebucketmigrator_srv:'confirm_sent_messages/1-fun-0'/3
      ancestors: [<20523.8294.14>,<0.7410.93>,<0.7325.93>,<0.27487.91>,
      <0.27357.91>]
      messages: []
      links: [<20523.8294.14>]
      dictionary: []
      trap_exit: false
      status: running
      heap_size: 987
      stack_size: 24
      reductions: 748
      neighbours:

      [error_logger:error,2012-11-14T20:44:37.954,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:init/1
      pid: <20523.8294.14>
      registered_name: []
      exception exit: downstream_closed
      in function gen_server:terminate/6
      ancestors: [<0.7410.93>,<0.7325.93>,<0.27487.91>,<0.27357.91>]
      messages: [

      {'EXIT',<20523.8296.14>,killed}

      ]
      links: [#Port<20523.253270>,<20523.8371.14>,<0.7410.93>,
      #Port<20523.253269>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 1597
      stack_size: 24
      reductions: 84633
      neighbours:

      ...

      [ns_server:error,2012-11-14T20:44:42.720,ns_1@10.3.2.131:<0.7424.93>:misc:inner_wait_shutdown:1426]Expected exit signal from <0.7426.93> but could not get it in 5 seconds. This is a bug, but process we're waiting for is dead (noproc), so trying to ignore...
      [ns_server:error,2012-11-14T20:44:42.735,ns_1@10.3.2.131:<0.7424.93>:misc:sync_shutdown_many_i_am_trapping_exits:1408]Shutdown of the following failed: [

      {<20200.30113.21>,normal},
      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}]
      [ns_server:info,2012-11-14T20:44:42.782,ns_1@10.3.2.131:<0.7424.93>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.2.141': [<<"replication_building_128_'ns_1@10.3.2.139'">>,
      <<"replication_building_128_'ns_1@10.3.2.131'">>,
      <<"replication_building_128_'ns_1@10.3.2.130'">>,
      <<"replication_building_128_'ns_1@10.3.2.132'">>]
      [error_logger:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server <0.7424.93> terminating
      ** Last message in was {'EXIT',<0.7426.93>,normal}
      ** When Server state == {state,"default",128,'ns_1@10.3.2.141',
      [{'ns_1@10.3.2.139',<20200.30113.21>},
      {'ns_1@10.3.2.131',<0.7426.93>},
      {'ns_1@10.3.2.130',<20199.10253.29>},
      {'ns_1@10.3.2.132',<20581.12222.6>}]}
      ** Reason for termination ==
      ** {{badmatch,[{<20200.30113.21>,normal}

      ,

      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2},
      {gen_server,terminate,6},
      {proc_lib,init_p_do_apply,3}]}

      [ns_server:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:<0.7368.93>:misc:sync_shutdown_many_i_am_trapping_exits:1408]Shutdown of the following failed: [{<0.7424.93>,
      {{badmatch,
      [{<20200.30113.21>,normal},
      {<0.7426.93>,noproc}

      ,

      {<20581.12222.6>,normal}]},
      [{misc, sync_shutdown_many_i_am_trapping_exits, 1},
      {misc,try_with_maybe_ignorant_after,2},
      {gen_server,terminate,6},
      {proc_lib,init_p_do_apply,3}]}},
      {<0.7429.93>,
      {{badmatch,
      [{'EXIT',
      {normal,
      {gen_server,call, [<0.7426.93>,had_backfill,30000]}}},
      {'EXIT',
      {shutdown,
      {gen_server,call, [<20199.10253.29>,had_backfill, 30000]}}}]},
      [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-', 1}]}}]
      [ns_server:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:<0.7368.93>:misc:try_with_maybe_ignorant_after:1444]Eating exception from ignorant after-block:
      {error,
      {badmatch,
      [{<0.7424.93>,
      {{badmatch,
      [{<20200.30113.21>,normal},
      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}

      ]},
      [

      {misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2},
      {gen_server,terminate,6},
      {proc_lib,init_p_do_apply,3}]}},
      {<0.7429.93>,
      {{badmatch,
      [{'EXIT',
      {normal,
      {gen_server,call,[<0.7426.93>,had_backfill,30000]}}},
      {'EXIT',
      {shutdown,
      {gen_server,call, [<20199.10253.29>,had_backfill,30000]}}}]},
      [{ns_single_vbucket_mover,'-wait_backfill_determination/1-fun-1-', 1}]}}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1}

      ,

      {misc,try_with_maybe_ignorant_after,2},
      {ns_single_vbucket_mover,mover,6},
      {proc_lib,init_p_do_apply,3}]}
      [error_logger:error,2012-11-14T20:44:42.782,ns_1@10.3.2.131:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: new_ns_replicas_builder:init/1
      pid: <0.7424.93>
      registered_name: []
      exception exit: {{badmatch,[{<20200.30113.21>,normal},
      {<0.7426.93>,noproc},
      {<20581.12222.6>,normal}]},
      [{misc,sync_shutdown_many_i_am_trapping_exits,1},
      {misc,try_with_maybe_ignorant_after,2}

      ,

      {gen_server,terminate,6}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}
      in function gen_server:terminate/6
      ancestors: [<0.7368.93>,<0.27487.91>,<0.27357.91>]
      messages: [

      {'EXIT',<0.7368.93>,shutdown}

      ]
      links: [<0.7368.93>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 514229
      stack_size: 24
      reductions: 66044
      neighbours:

      [user:info,2012-11-14T20:44:42.798,ns_1@10.3.2.131:<0.385.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason

      {mover_failed,downstream_closed}
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Show
        iryna iryna added a comment - Diags: https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-130.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-131.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-132.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-137.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-138.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-139.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/diag-141.txt.gz collect_info: https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-130.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-131.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-132.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-137.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-138.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-139.zip https://s3.amazonaws.com/bugdb/jira/MB-7201/820cd183/cbcollect-141.zip
        Hide
        dipti Dipti Borkar added a comment -

        Chiyoung, can you take a look as to why memcached crashed?

        Show
        dipti Dipti Borkar added a comment - Chiyoung, can you take a look as to why memcached crashed?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Not sure why people think memcached crashed. Maybe yes, but from log messages above it could be just one of those weird disconnects for no apparent reason

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Not sure why people think memcached crashed. Maybe yes, but from log messages above it could be just one of those weird disconnects for no apparent reason
        Hide
        steve Steve Yen added a comment -

        per bug-scrub: to 2.0.1 as memcached didn't crash.

        Show
        steve Steve Yen added a comment - per bug-scrub: to 2.0.1 as memcached didn't crash.
        Hide
        chiyoung Chiyoung Seo added a comment -

        memcached process on 10.3.2.141 was suddenly restarted:

        C:\Program Files\Couchbase\Server\var\lib\couchbase\logs>type memcached.log.0.txt
        ...
        Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: Notified the completion of checkpoint persistence for vbucket 403, cookie 00000000059A9340
        Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.139' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
        Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.139' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
        Wed Nov 14 20:44:24.623898 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
        Wed Nov 14 20:44:24.623898 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
        Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Schedule the backfill for vbucket 127
        Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
        Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
        Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 127
        Wed Nov 14 20:44:24.655148 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Backfill is completed with VBuckets 127,
        Wed Nov 14 20:44:24.655148 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 127
        Wed Nov 14 20:44:24.905148 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_1847 - disconnected

        C:\Program Files\Couchbase\Server\var\lib\couchbase\logs>type memcached.log.1.txt

        Wed Nov 14 20:44:32.577023 Pacific Standard Time 3: Trying to connect to mccouch: "localhost:11213"
        Wed Nov 14 20:44:33.608273 Pacific Standard Time 3: Connected to mccouch: "localhost:11213"
        Wed Nov 14 20:44:33.748898 Pacific Standard Time 3: Extension support isn't implemented in this version of bucket_engine
        Wed Nov 14 20:44:33.842648 Pacific Standard Time 3: Failed to load mutation log, falling back to key dump
        Wed Nov 14 20:44:34.623898 Pacific Standard Time 3: metadata loaded in 1012 ms
        Wed Nov 14 20:44:35.592648 Pacific Standard Time 3: warmup completed in 1993 ms
        Wed Nov 14 20:44:39.686398 Pacific Standard Time 3: Deletion of vbucket 675 was completed.
        Wed Nov 14 20:44:40.170773 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 63
        Wed Nov 14 20:44:40.170773 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 64
        Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 192
        Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 193
        Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 194
        Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 391
        ...

        From the above log snippets, we can see that the memcached process on 10.3.2.141 was suddenly restarted between 20:44:24 and 20:44:32, but unfortunately, there were NO error / warning / fatal logs from memcached and ep-engine.

        Iryna,

        Can you increase the memcached log level to INFO?

        I know this would increase the logging overhead a lot, but we really need to understand what happened in memcached / ep-engine layer.

        Show
        chiyoung Chiyoung Seo added a comment - memcached process on 10.3.2.141 was suddenly restarted: C:\Program Files\Couchbase\Server\var\lib\couchbase\logs>type memcached.log.0.txt ... Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: Notified the completion of checkpoint persistence for vbucket 403, cookie 00000000059A9340 Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.139' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0 Wed Nov 14 20:44:24.608273 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.139' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0 Wed Nov 14 20:44:24.623898 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0 Wed Nov 14 20:44:24.623898 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.130' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0 Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Schedule the backfill for vbucket 127 Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0 Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0 Wed Nov 14 20:44:24.639523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 127 Wed Nov 14 20:44:24.655148 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Backfill is completed with VBuckets 127, Wed Nov 14 20:44:24.655148 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_building_127_'ns_1@10.3.2.132' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 127 Wed Nov 14 20:44:24.905148 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_1847 - disconnected C:\Program Files\Couchbase\Server\var\lib\couchbase\logs>type memcached.log.1.txt Wed Nov 14 20:44:32.577023 Pacific Standard Time 3: Trying to connect to mccouch: "localhost:11213" Wed Nov 14 20:44:33.608273 Pacific Standard Time 3: Connected to mccouch: "localhost:11213" Wed Nov 14 20:44:33.748898 Pacific Standard Time 3: Extension support isn't implemented in this version of bucket_engine Wed Nov 14 20:44:33.842648 Pacific Standard Time 3: Failed to load mutation log, falling back to key dump Wed Nov 14 20:44:34.623898 Pacific Standard Time 3: metadata loaded in 1012 ms Wed Nov 14 20:44:35.592648 Pacific Standard Time 3: warmup completed in 1993 ms Wed Nov 14 20:44:39.686398 Pacific Standard Time 3: Deletion of vbucket 675 was completed. Wed Nov 14 20:44:40.170773 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 63 Wed Nov 14 20:44:40.170773 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 64 Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 192 Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 193 Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 194 Wed Nov 14 20:44:40.264523 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.132 - Schedule the backfill for vbucket 391 ... From the above log snippets, we can see that the memcached process on 10.3.2.141 was suddenly restarted between 20:44:24 and 20:44:32, but unfortunately, there were NO error / warning / fatal logs from memcached and ep-engine. Iryna, Can you increase the memcached log level to INFO? I know this would increase the logging overhead a lot, but we really need to understand what happened in memcached / ep-engine layer.
        Hide
        chiyoung Chiyoung Seo added a comment -

        memcached exited with unknown status 255 after deleting vbucket 657 in its second attempt successfully:

        [ns_server:info,2012-11-14T20:44:29.931,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:log:171]memcached<0.28126.12>: Wed Nov 14 20:44:29.733273 Pacific Standard Time 3: Deletion of vbucket 675 failed because the vbucket is not in a dead state
        memcached<0.28126.12>: Wed Nov 14 20:44:29.733273 Pacific Standard Time 3: Deletion of vbucket 675 was completed.

        [rebalance:info,2012-11-14T20:44:30.181,ns_1@10.3.2.141:<0.8251.14>:ebucketmigrator_srv:do_confirm_sent_messages:684]Got close ack!

        [ns_server:info,2012-11-14T20:44:30.384,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:log:171]memcached<0.28126.12>: Wed Nov 14 20:44:30.186398 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_1852 - disconnected

        [ns_server:info,2012-11-14T20:44:32.416,ns_1@10.3.2.141:<0.28199.12>:mc_connection:run_loop:202]mccouch connection was normally closed
        [rebalance:warn,2012-11-14T20:44:32.416,ns_1@10.3.2.141:<0.7551.14>:ebucketmigrator_srv:do_confirm_sent_messages:691]Got error while trying to read close ack:

        {error,einval}

        [user:info,2012-11-14T20:44:32.416,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:handle_info:107]Port server memcached on node 'ns_1@10.3.2.141' exited with status 255. Restarting.

        It seems to me that this issue is the duplicate of MB-7246. Basically, memcached occasionally exits with 255 unknown status code when the ep-engine hits the flusher or vbucket deletion related warnings.

        Show
        chiyoung Chiyoung Seo added a comment - memcached exited with unknown status 255 after deleting vbucket 657 in its second attempt successfully: [ns_server:info,2012-11-14T20:44:29.931,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:log:171] memcached<0.28126.12>: Wed Nov 14 20:44:29.733273 Pacific Standard Time 3: Deletion of vbucket 675 failed because the vbucket is not in a dead state memcached<0.28126.12>: Wed Nov 14 20:44:29.733273 Pacific Standard Time 3: Deletion of vbucket 675 was completed. [rebalance:info,2012-11-14T20:44:30.181,ns_1@10.3.2.141:<0.8251.14>:ebucketmigrator_srv:do_confirm_sent_messages:684] Got close ack! [ns_server:info,2012-11-14T20:44:30.384,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:log:171] memcached<0.28126.12>: Wed Nov 14 20:44:30.186398 Pacific Standard Time 3: TAP (Consumer) eq_tapq:anon_1852 - disconnected [ns_server:info,2012-11-14T20:44:32.416,ns_1@10.3.2.141:<0.28199.12>:mc_connection:run_loop:202] mccouch connection was normally closed [rebalance:warn,2012-11-14T20:44:32.416,ns_1@10.3.2.141:<0.7551.14>:ebucketmigrator_srv:do_confirm_sent_messages:691] Got error while trying to read close ack: {error,einval} [user:info,2012-11-14T20:44:32.416,ns_1@10.3.2.141:ns_port_memcached<0.28126.12>:ns_port_server:handle_info:107] Port server memcached on node 'ns_1@10.3.2.141' exited with status 255. Restarting. It seems to me that this issue is the duplicate of MB-7246 . Basically, memcached occasionally exits with 255 unknown status code when the ep-engine hits the flusher or vbucket deletion related warnings.
        Hide
        chiyoung Chiyoung Seo added a comment -

        We didn't see this issue anymore after installing the service pack 1 on windows 2008

        Show
        chiyoung Chiyoung Seo added a comment - We didn't see this issue anymore after installing the service pack 1 on windows 2008

          People

          • Assignee:
            chiyoung Chiyoung Seo
            Reporter:
            iryna iryna
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes