Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7362

Rebalance exited with reason {{bulk_set_vbucket_state_failed, after Port server memcached on node 'ns_1@10.3.2.157' exited with status 139 (test_add_back_failed_node)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
      None

      Description

      Build 1974
      http://qa.hq.northscale.net/job/ubuntu-32-2.0-swaprebalance-test-P0/15/consoleFull
      ./testrunner -i /tmp/swaprebalance-32.ini get-logs=True,GROUP=P0 -t swaprebalance.SwapRebalanceFailedTests.test_add_back_failed_node,replica=2,num-buckets=2,num-swap=3,swap-orchestrator=True,GROUP=BASIC;P0

      Rebalance exited with reason {{bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.2.158',
      {'EXIT',
      {{{{unexpected_reason,
      badmatch,{error,closed,
      [

      {mc_binary,quick_stats_recv,3}

      ,

      {mc_binary,quick_stats_loop,5}

      ,

      {mc_binary,quick_stats,5}

      ,

      {mc_client_binary, get_zero_open_checkpoint_vbuckets,3}

      ,

      {ebucketmigrator_srv,handle_call,3}

      ,

      {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]}},
      [{misc,executing_on_new_process,1}, {tap_replication_manager, change_vbucket_filter,4}, {tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2}, {tap_replication_manager, do_set_incoming_replication_map,3}, {tap_replication_manager,handle_call,3},{gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}]},
      {gen_server,call,
      ['tap_replication_manager-bucket-0', {change_vbucket_replication,182, undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-bucket-0', 'ns_1@10.3.2.158'},
      {if_rebalance,<0.22930.19>,
      {update_vbucket_state,182,replica,
      undefined,undefined}},
      infinity]}}}}]},
      [{janitor_agent,bulk_set_vbucket_state,4}, {ns_vbucket_mover, update_replication_post_move,3}, {ns_vbucket_mover,handle_info,2}, {gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}

      ]}

      Port server memcached on node 'ns_1@10.3.2.157' exited with status 139. Restarting. Messages: Wed Dec 5 04:40:09.169321 PST 3: TAP (Producer) eq_tapq:rebalance_180 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Dec 5 04:40:09.169336 PST 3: TAP (Producer) eq_tapq:rebalance_180 - Sending TAP_VBUCKET_SET with vbucket 180 and state "pending"
      Wed Dec 5 04:40:09.169498 PST 3: Schedule cleanup of "eq_tapq:anon_6103"
      Wed Dec 5 04:40:09.169845 PST 3: TAP (Producer) eq_tapq:replication_building_180_'ns_1@10.3.2.154' - Clear the tap queues by force
      Wed Dec 5 04:40:09.171581 PST 3: TAP (Producer) eq_tapq:rebalance_180 - VBucket <180> is going dead to complete vbucket takeover.
      Wed Dec 5 04:40:09.173733 PST 3: TAP (Producer) eq_tapq:rebalance_180 - Sending TAP_VBUCKET_SET with vbucket 180 and state "active"
      Wed Dec 5 04:40:09.174763 PST 3: TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_180>
      Wed Dec 5 04:40:09.174856 PST 3: TAP (Producer) eq_tapq:rebalance_180 - disconnected
      Wed Dec 5 04:40:09.181608 PST 3: TAP (Producer) eq_tapq:replication_building_180_'ns_1@10.3.2.156' - disconnected, keep alive for 300 seconds
      Wed Dec 5 04:40:09.182019 PST 3: TAP (Producer) eq_tapq:replication_building_180_'ns_1@10.3.2.153' - disconnected, keep alive for 300 seconds
      Wed Dec 5 04:40:09.187172 PST 3: TAP (Producer) eq_tapq:replication_building_180_'ns_1@10.3.2.153' - Connection is closed by force.
      Wed Dec 5 04:40:09.187460 PST 3: TAP (Producer) eq_tapq:replication_building_180_'ns_1@10.3.2.156' - Connection is closed by force.
      Wed Dec 5 04:40:09.340620 PST 3: Schedule cleanup of "eq_tapq:anon_6104"
      Wed Dec 5 04:40:09.340682 PST 3: Schedule cleanup of "eq_tapq:anon_6105"
      Wed Dec 5 04:40:09.340703 PST 3: Schedule cleanup of "eq_tapq:rebalance_180"
      Wed Dec 5 04:40:09.340711 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.158 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0
      Wed Dec 5 04:40:09.341253 PST 3: TAP (Producer) eq_tapq:replication_building_180_'ns_1@10.3.2.153' - Clear the tap queues by force
      Wed Dec 5 04:40:09.341298 PST 3: TAP (Producer) eq_tapq:replication_building_180_'ns_1@10.3.2.156' - Clear the tap queues by force
      Wed Dec 5 04:40:09.341353 PST 3: TAP (Producer) eq_tapq:rebalance_180 - Clear the tap queues by force
      Wed Dec 5 04:40:09.843018 PST 3: Deletion of vbucket 180 was completed.
      Wed Dec 5 04:40:10.234534 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.154' - Schedule the backfill for vbucket 181
      Wed Dec 5 04:40:10.234615 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.154' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Dec 5 04:40:10.234636 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.154' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Dec 5 04:40:10.234650 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.154' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 181
      Wed Dec 5 04:40:10.241879 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.154' - Backfill is completed with VBuckets 181,
      Wed Dec 5 04:40:10.241971 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.154' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 181
      Wed Dec 5 04:40:10.281477 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.153' - Schedule the backfill for vbucket 181
      Wed Dec 5 04:40:10.281533 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.153' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Dec 5 04:40:10.281552 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.153' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Dec 5 04:40:10.281578 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.153' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 181
      Wed Dec 5 04:40:10.289333 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.153' - Backfill is completed with VBuckets 181,
      Wed Dec 5 04:40:10.289406 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.153' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 181
      Wed Dec 5 04:40:10.295496 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.156' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Dec 5 04:40:10.295529 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.156' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Dec 5 04:40:10.427784 PST 3: Deletion of vbucket 333 failed because the vbucket is not in a dead state
      Wed Dec 5 04:40:10.429706 PST 3: Deletion of vbucket 333 was completed.
      Wed Dec 5 04:40:10.717251 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.154' - disconnected, keep alive for 300 seconds
      Wed Dec 5 04:40:10.723082 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.154' - Connection is closed by force.
      Wed Dec 5 04:40:10.816240 PST 3: TAP (Producer) eq_tapq:rebalance_181 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Dec 5 04:40:10.816297 PST 3: TAP (Producer) eq_tapq:rebalance_181 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Dec 5 04:40:10.816316 PST 3: TAP (Producer) eq_tapq:rebalance_181 - Sending TAP_VBUCKET_SET with vbucket 181 and state "pending"
      Wed Dec 5 04:40:10.816438 PST 3: Schedule cleanup of "eq_tapq:anon_6107"
      Wed Dec 5 04:40:10.816701 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.154' - Clear the tap queues by force
      Wed Dec 5 04:40:10.818185 PST 3: TAP (Producer) eq_tapq:rebalance_181 - VBucket <181> is going dead to complete vbucket takeover.
      Wed Dec 5 04:40:10.818847 PST 3: TAP (Producer) eq_tapq:rebalance_181 - Sending TAP_VBUCKET_SET with vbucket 181 and state "active"
      Wed Dec 5 04:40:10.819634 PST 3: TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_181>
      Wed Dec 5 04:40:10.819695 PST 3: TAP (Producer) eq_tapq:rebalance_181 - disconnected
      Wed Dec 5 04:40:10.820206 PST 3: Schedule cleanup of "eq_tapq:rebalance_181"
      Wed Dec 5 04:40:10.820365 PST 3: TAP (Producer) eq_tapq:rebalance_181 - Clear the tap queues by force
      Wed Dec 5 04:40:10.826280 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.156' - disconnected, keep alive for 300 seconds
      Wed Dec 5 04:40:10.826430 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.153' - disconnected, keep alive for 300 seconds
      Wed Dec 5 04:40:10.868170 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.153' - Connection is closed by force.
      Wed Dec 5 04:40:10.868479 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.156' - Connection is closed by force.
      Wed Dec 5 04:40:11.157759 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.158 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0
      Wed Dec 5 04:40:11.158128 PST 3: Schedule cleanup of "eq_tapq:anon_6108"
      Wed Dec 5 04:40:11.158185 PST 3: Schedule cleanup of "eq_tapq:anon_6109"
      Wed Dec 5 04:40:11.158426 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.153' - Clear the tap queues by force
      Wed Dec 5 04:40:11.158487 PST 3: TAP (Producer) eq_tapq:replication_building_181_'ns_1@10.3.2.156' - Clear the tap queues by force
      Wed Dec 5 04:40:11.534425 PST 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.2.158 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0
      Wed Dec 5 04:40:11.640339 PST 3: Deletion of vbucket 181 was completed.
      Wed Dec 5 04:40:11.839776 PST 3: TAP (Consumer) eq_tapq:anon_6106 - disconnected
      Wed Dec 5 04:40:12.179485 PST 3: Schedule cleanup of "eq_tapq:anon_6106"
      Wed Dec 5 04:40:12.690361 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.154' - Schedule the backfill for vbucket 182
      Wed Dec 5 04:40:12.690457 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.154' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Dec 5 04:40:12.690486 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.154' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Dec 5 04:40:12.690509 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.154' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 182
      Wed Dec 5 04:40:12.702699 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.154' - Backfill is completed with VBuckets 182,
      Wed Dec 5 04:40:12.702809 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.154' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 182
      Wed Dec 5 04:40:12.707574 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.153' - Schedule the backfill for vbucket 182
      Wed Dec 5 04:40:12.707654 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.153' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Dec 5 04:40:12.707675 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.153' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Dec 5 04:40:12.707688 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.153' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 182
      Wed Dec 5 04:40:12.722620 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.156' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Dec 5 04:40:12.722727 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.156' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Dec 5 04:40:12.725352 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.153' - Backfill is completed with VBuckets 182,
      Wed Dec 5 04:40:12.725479 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.153' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 182
      Wed Dec 5 04:40:13.015467 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.154' - disconnected, keep alive for 300 seconds
      Wed Dec 5 04:40:13.048863 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.154' - Connection is closed by force.
      Wed Dec 5 04:40:13.154098 PST 3: TAP (Producer) eq_tapq:rebalance_182 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Dec 5 04:40:13.154143 PST 3: TAP (Producer) eq_tapq:rebalance_182 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Dec 5 04:40:13.154169 PST 3: TAP (Producer) eq_tapq:rebalance_182 - Sending TAP_VBUCKET_SET with vbucket 182 and state "pending"
      Wed Dec 5 04:40:13.154662 PST 3: Schedule cleanup of "eq_tapq:anon_6110"
      Wed Dec 5 04:40:13.155010 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.154' - Clear the tap queues by force
      Wed Dec 5 04:40:13.160851 PST 3: TAP (Producer) eq_tapq:rebalance_182 - VBucket <182> is going dead to complete vbucket takeover.
      Wed Dec 5 04:40:13.163703 PST 3: TAP (Producer) eq_tapq:rebalance_182 - Sending TAP_VBUCKET_SET with vbucket 182 and state "active"
      Wed Dec 5 04:40:13.164797 PST 3: TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_182>
      Wed Dec 5 04:40:13.164883 PST 3: TAP (Producer) eq_tapq:rebalance_182 - disconnected
      Wed Dec 5 04:40:13.174251 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.156' - disconnected, keep alive for 300 seconds
      Wed Dec 5 04:40:13.174631 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.153' - disconnected, keep alive for 300 seconds
      Wed Dec 5 04:40:13.178928 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.153' - Connection is closed by force.
      Wed Dec 5 04:40:13.179246 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.156' - Connection is closed by force.
      Wed Dec 5 04:40:14.166165 PST 3: Schedule cleanup of "eq_tapq:anon_6111"
      Wed Dec 5 04:40:14.166235 PST 3: Schedule cleanup of "eq_tapq:anon_6112"
      Wed Dec 5 04:40:14.166257 PST 3: Schedule cleanup of "eq_tapq:rebalance_182"
      Wed Dec 5 04:40:14.166478 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.153' - Clear the tap queues by force
      Wed Dec 5 04:40:14.166552 PST 3: TAP (Producer) eq_tapq:replication_building_182_'ns_1@10.3.2.156' - Clear the tap queues by force
      Wed Dec 5 04:40:14.166584 PST 3: TAP (Producer) eq_tapq:rebalance_182 - Clear the tap queues by force
      Wed Dec 5 04:40:14.386307 PST 3: Deletion of vbucket 334 failed because the vbucket is not in a dead state
      Wed Dec 5 04:40:14.386832 PST 3: Deletion of vbucket 334 was completed.
      Wed Dec 5 04:40:14.508046 PST 3: TAP (Consumer) eq_tapq:anon_6113 - disconnected

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Show
        andreibaranouski Andrei Baranouski added a comment - https://s3.amazonaws.com/bugdb/jira/MB-7362/d68cd356-3070-4a46-81b3-9e6118cabc91-10.3.2.153-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7362/d68cd356-3070-4a46-81b3-9e6118cabc91-10.3.2.154-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7362/d68cd356-3070-4a46-81b3-9e6118cabc91-10.3.2.155-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7362/d68cd356-3070-4a46-81b3-9e6118cabc91-10.3.2.156-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7362/d68cd356-3070-4a46-81b3-9e6118cabc91-10.3.2.157-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7362/d68cd356-3070-4a46-81b3-9e6118cabc91-10.3.2.158-diag.txt.gz
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        this test is ran after MB-7272 was fixed yesterday.
        Aliaksey , can you please look at the logs ?

        Show
        farshid Farshid Ghods (Inactive) added a comment - this test is ran after MB-7272 was fixed yesterday. Aliaksey , can you please look at the logs ?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        What exactly you want me to look at ? And somewhat surprised why

        Memcached died due to segfault and connection died because of that.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - What exactly you want me to look at ? And somewhat surprised why Memcached died due to segfault and connection died because of that.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Thanks
        So this needs to be assign to couchbase bucket team

        Show
        farshid Farshid Ghods (Inactive) added a comment - Thanks So this needs to be assign to couchbase bucket team
        Hide
        jin Jin Lim added a comment -

        Will take a look at it. Andrei - was this test running on a Linux machine? If so I wonder if there was any core dump. Thanks.

        Show
        jin Jin Lim added a comment - Will take a look at it. Andrei - was this test running on a Linux machine? If so I wonder if there was any core dump. Thanks.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        merged in build 1976

        Show
        farshid Farshid Ghods (Inactive) added a comment - merged in build 1976
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ep-engine-2-0 #461 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/461/)
        MB-7362 allow warmup transition to done from any state during shutdown (Revision 4ac7f92ba50994c4ca10ec00d2b2627bfc153263)

        Result = SUCCESS
        Jin :
        Files :

        • src/warmup.hh
        • src/warmup.cc
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ep-engine-2-0 #461 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/461/ ) MB-7362 allow warmup transition to done from any state during shutdown (Revision 4ac7f92ba50994c4ca10ec00d2b2627bfc153263) Result = SUCCESS Jin : Files : src/warmup.hh src/warmup.cc

          People

          • Assignee:
            jin Jin Lim
            Reporter:
            andreibaranouski Andrei Baranouski
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes