Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-42780

[Upgrade] Rebalance_in failed with reason "bulk_set_vbucket_state_failed :: sync_shutdown_many_i_am_trapping_exits"

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 6.6.0
    • 6.6.1
    • couchbase-bucket
    • Enterprise Edition 6.0.1 build 2037 (Existing 4 node cluster)
      Enterprise Edition 6.6.1 build 9182 (New node coming in)

    Description

       Scenario:

      1. 4 Node cluster (6.0.1 build 2037)
      2. Create couchbase-bucket with replica=1, size=100M
      3. Load bucket into DGM using the cbworkloadgen,

        ./cbworkloadgen -n 172.23.105.155:8091 -b default -u Administrator -p password --max-items=1200000 -r .95 -l

      4. Rebalance_in 6.6.1-9182 node into the cluster (172.23.105.244)

      Observation:

      Rebalance failed with following reason,

      Worker <0.3486.0> (for action {move,{577,
      ['ns_1@172.23.105.212',
      'ns_1@172.23.105.155'],
      ['ns_1@172.23.105.155',
      'ns_1@172.23.105.244'],
      []}}) exited with reason {unexpected_exit,
      {'EXIT', <0.3512.0>,
      {{bulk_set_vbucket_state_failed,
      [{'ns_1@172.23.105.244',
      {'EXIT',
      {{{{{badmatch,
      [{<0.3659.0>,
      {done, exit, {socket_closed,
      {gen_server, call, [<0.2333.0>,
      {setup_streams, [83, 339]}, infinity]}},
      [{gen_server, call, 3,
      [{file, "gen_server.erl"}, {line, 214}]},
      {dcp_replicator, '-spawn_and_wait/1-fun-0-', 1,
      [{file, "src/dcp_replicator.erl"}, {line, 243}]}]}}]},
      [{misc, sync_shutdown_many_i_am_trapping_exits, 1, 
      [{file, "src/misc.erl"}, {line, 1374}]},
      {dcp_replicator, spawn_and_wait, 1,
      [{file, "src/dcp_replicator.erl"}, {line, 265}]},
      {dcp_replicator, handle_call, 3,
      [{file, "src/dcp_replicator.erl"}, {line, 121}]},
      {gen_server, try_handle_call, 4,
      [{file, "gen_server.erl"}, {line, 636}]},
      {gen_server, handle_msg, 6,
      [{file, "gen_server.erl"}, {line, 665}]},
      {proc_lib, init_p_do_apply, 3,
      [{file, "proc_lib.erl"}, {line, 247}]}]},
      {gen_server, call,
      [<0.2332.0>, get_partitions, infinity]}},
      {gen_server, call,
      ['dcp_replication_manager-default',
      {get_replicator_pid, 83}, infinity]}},
      {gen_server, call,
      [{'janitor_agent-default', 'ns_1@172.23.105.244'},
      {if_rebalance, <0.2246.0>,
      {update_vbucket_state, 577, replica,
      undefined, 'ns_1@172.23.105.212'}}, infinity]}}}}]},
      [{janitor_agent, bulk_set_vbucket_state, 4,
      [{file, "src/janitor_agent.erl"}, {line, 403}]},
      {proc_lib, init_p,3,
      [{file, "proc_lib.erl"}, {line, 232}]}]}}} 

      Note: Hit this while trying to validate MB-41283

      CC Richard deMellow

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            paolo.cocchi Paolo Cocchi added a comment - - edited

            ns_server.info.log:

            [ns_server:info,2020-11-16T22:39:20.252-08:00,ns_1@172.23.105.155:janitor_agent-default<0.14362.0>:janitor_agent:handle_info:801]Rebalancer <13742.2246.0> died with reason {unexpected_
            exit,
                                                        {'EXIT',<13742.3512.0>,
                                                         {{bulk_set_vbucket_state_failed,
                                                           [{'ns_1@172.23.105.244',
                                                             {'EXIT',
                                                              {{{{{badmatch,
                                                                   [{<13742.3659.0>,
                                                                     {done,exit,
                                                                      {socket_closed,
            

            memcached error on node .244 (which is the only 6.6.1 node), replica connection being closed:

            2020-11-16T22:39:19.828037-08:00 INFO 53: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb:83) Attempting to add stream: opaque_:19, start_seqno_:0, end_seqno_:18446744073709551615, vb_uuid:267978355066003, snap_start_seqno_:0, snap_end_seqno_:0, last_seqno:0, stream_req_value:{"uid":"0"}
            ..
            2020-11-16T22:39:19.858135-08:00 ERROR 53: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ {\"ip\":\"127.0.0.1\",\"port\":43312} - {\"ip\":\"127.0.0.1\",\"port\":11209} (<ud>@ns_server</ud>) ]","engine_storage":"0x00007f4283feac10","ewouldblock":false,"packet":{"bodylen":52,"cas":1605595159858380800,"datatype":"raw","extlen":31,"key":"<ud>pymc1033629</ud>","keylen":11,"magic":"ClientRequest","opaque":19,"opcode":"DCP_MUTATION","vbucket":83},"refcount":1}] - closing connection ([ {"ip":"127.0.0.1","port":43312} - {"ip":"127.0.0.1","port":11209} (<ud>@ns_server</ud>) ]): Checkpoint::queueDirty: Unable to find key in keyIndex with op:mutation seqno:11190for cursor:persistence in current checkpoint.
            

            The producer is node .155 (6.0.1) that is sending a disk snapshot:

            2020-11-16T22:39:19.831717-08:00 INFO 59: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb 83) Creating stream with start seqno 0 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615
            2020-11-16T22:39:19.831763-08:00 INFO 59: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb 83) Scheduling backfill from 1 to 12024, reschedule flag : False
            2020-11-16T22:39:19.832021-08:00 INFO 59: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb 83) Sending disk snapshot with start seqno 0 and end seqno 12106
            

            paolo.cocchi Paolo Cocchi added a comment - - edited ns_server.info.log: [ns_server:info,2020-11-16T22:39:20.252-08:00,ns_1@172.23.105.155:janitor_agent-default<0.14362.0>:janitor_agent:handle_info:801]Rebalancer <13742.2246.0> died with reason {unexpected_ exit, {'EXIT',<13742.3512.0>, {{bulk_set_vbucket_state_failed, [{'ns_1@172.23.105.244', {'EXIT', {{{{{badmatch, [{<13742.3659.0>, {done,exit, {socket_closed, memcached error on node .244 (which is the only 6.6.1 node), replica connection being closed: 2020-11-16T22:39:19.828037-08:00 INFO 53: (default) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb:83) Attempting to add stream: opaque_:19, start_seqno_:0, end_seqno_:18446744073709551615, vb_uuid:267978355066003, snap_start_seqno_:0, snap_end_seqno_:0, last_seqno:0, stream_req_value:{"uid":"0"} .. 2020-11-16T22:39:19.858135-08:00 ERROR 53: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ {\"ip\":\"127.0.0.1\",\"port\":43312} - {\"ip\":\"127.0.0.1\",\"port\":11209} (<ud>@ns_server</ud>) ]","engine_storage":"0x00007f4283feac10","ewouldblock":false,"packet":{"bodylen":52,"cas":1605595159858380800,"datatype":"raw","extlen":31,"key":"<ud>pymc1033629</ud>","keylen":11,"magic":"ClientRequest","opaque":19,"opcode":"DCP_MUTATION","vbucket":83},"refcount":1}] - closing connection ([ {"ip":"127.0.0.1","port":43312} - {"ip":"127.0.0.1","port":11209} (<ud>@ns_server</ud>) ]): Checkpoint::queueDirty: Unable to find key in keyIndex with op:mutation seqno:11190for cursor:persistence in current checkpoint. The producer is node .155 (6.0.1) that is sending a disk snapshot: 2020-11-16T22:39:19.831717-08:00 INFO 59: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb 83) Creating stream with start seqno 0 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615 2020-11-16T22:39:19.831763-08:00 INFO 59: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb 83) Scheduling backfill from 1 to 12024, reschedule flag : False 2020-11-16T22:39:19.832021-08:00 INFO 59: (default) DCP (Producer) eq_dcpq:replication:ns_1@172.23.105.155->ns_1@172.23.105.244:default - (vb 83) Sending disk snapshot with start seqno 0 and end seqno 12106
            paolo.cocchi Paolo Cocchi added a comment - - edited

            I am keeping an eye at the motivation behind MB-41283. As per fix description at http://review.couchbase.org/c/kv_engine/+/135630, the issue involves checkpoint marker flags. It may be useful to highlight that from 6.0.x to 6.5.x we have fixed/changed something in the area of marker flags that are sent from active to replica, which may relate to this issue.

            paolo.cocchi Paolo Cocchi added a comment - - edited I am keeping an eye at the motivation behind MB-41283 . As per fix description at http://review.couchbase.org/c/kv_engine/+/135630 , the issue involves checkpoint marker flags. It may be useful to highlight that from 6.0.x to 6.5.x we have fixed/changed something in the area of marker flags that are sent from active to replica, which may relate to this issue.
            paolo.cocchi Paolo Cocchi added a comment - - edited

            We fail in:

            int64_t Checkpoint::getMutationId(const CheckpointCursor& cursor) const {
            ..
                auto& keyIndex = (*cursor.currentPos)->isCommitted() ? committedKeyIndex
                                                                     : preparedKeyIndex;
                auto cursor_item_idx = keyIndex.find(makeIndexKey(*cursor.currentPos));
                if (cursor_item_idx == keyIndex.end()) {
                    throw std::logic_error(
                            "Checkpoint::queueDirty: Unable "
                            "to find key in keyIndex with op:" +
                            to_string((*cursor.currentPos)->getOperation()) +
                            " seqno:" + std::to_string((*cursor.currentPos)->getBySeqno()) +
                            " for cursor:" + cursor.name + " in current checkpoint.");
                }
                return cursor_item_idx->second.mutation_id;
            }
            

            Called by:

            QueueDirtyStatus Checkpoint::queueDirty(const queued_item& qi,
                                                    CheckpointManager* checkpointManager) {
            ..
                if (qi->isCheckPointMetaItem()) {
                    ..
                } else {
                    // Check in the appropriate key index if an item already exists.
                    auto& keyIndex =
                            qi->isCommitted() ? committedKeyIndex : preparedKeyIndex;
                    auto it = keyIndex.find(makeIndexKey(qi));
                    ..
                    // Check if this checkpoint already has an item for the same key
                    // and the item has not been expelled.
                    if (it != keyIndex.end()) {
                        if (it->second.mutation_id > highestExpelledSeqno) {
                            // Normal path - we haven't expelled the item. We have a valid
                            // cursor position to read the item and make our de-dupe checks.
                            const auto currPos = it->second.position;
                            if (!(canDedup(*currPos, qi))) {
                                return QueueDirtyStatus::FailureDuplicateItem;
                            }
             
                            rv = QueueDirtyStatus::SuccessExistingItem;
                            const int64_t currMutationId{it->second.mutation_id};
             
                            // Given the key already exists, need to check all cursors in
                            // this Checkpoint and see if the existing item for this key is
                            // to the "left" of the cursor (i.e. has already been
                            // processed).
                            for (auto& cursor : checkpointManager->cursors) {
                                if ((*(cursor.second->currentCheckpoint)).get() == this) {
                                    if (cursor.second->name ==
                                        CheckpointManager::pCursorName) {
                                        int64_t cursor_mutation_id =
                                                getMutationId(*cursor.second);         <------- !!
                                        queued_item& cursor_item =
                                                *(cursor.second->currentPos);
            ..
            

            Not related to the issue, but just a comment on Checkpoint::getMutationId() and its usage.
            Essentially in Checkpoint::queueDirty() we need to compare 'it->second.mutation_id' to the seqno of the item currently pointed by each cursor placed in the checkpoint (I'll name it cursor-seqno from now on).
            For getting cursor-seqno we call Checkpoint::getMutationId(cursor). The logic there is: (1) get the cursor_item_idx and (2) return cursor_item_idx->second.mutation_id.
            Couldn't we just do 'cursor-seqno = (*(cursor.second->currentPos))->getBySeqno()' in queueDirty() and avoid the call to getMutationId() ?

            Note that, given how Checkpoint::queueDirty() and Checkpoint::getMutationId() work, then the mutation dumped in logs is just the mutation pointed by the persistence cursor at the time of the failure (ie, not the mutation that the node is processing when we fail).

            paolo.cocchi Paolo Cocchi added a comment - - edited We fail in: int64_t Checkpoint::getMutationId(const CheckpointCursor& cursor) const { .. auto& keyIndex = (*cursor.currentPos)->isCommitted() ? committedKeyIndex : preparedKeyIndex; auto cursor_item_idx = keyIndex.find(makeIndexKey(*cursor.currentPos)); if (cursor_item_idx == keyIndex.end()) { throw std::logic_error( "Checkpoint::queueDirty: Unable " "to find key in keyIndex with op:" + to_string((*cursor.currentPos)->getOperation()) + " seqno:" + std::to_string((*cursor.currentPos)->getBySeqno()) + " for cursor:" + cursor.name + " in current checkpoint."); } return cursor_item_idx->second.mutation_id; } Called by: QueueDirtyStatus Checkpoint::queueDirty(const queued_item& qi, CheckpointManager* checkpointManager) { .. if (qi->isCheckPointMetaItem()) { .. } else { // Check in the appropriate key index if an item already exists. auto& keyIndex = qi->isCommitted() ? committedKeyIndex : preparedKeyIndex; auto it = keyIndex.find(makeIndexKey(qi)); .. // Check if this checkpoint already has an item for the same key // and the item has not been expelled. if (it != keyIndex.end()) { if (it->second.mutation_id > highestExpelledSeqno) { // Normal path - we haven't expelled the item. We have a valid // cursor position to read the item and make our de-dupe checks. const auto currPos = it->second.position; if (!(canDedup(*currPos, qi))) { return QueueDirtyStatus::FailureDuplicateItem; }   rv = QueueDirtyStatus::SuccessExistingItem; const int64_t currMutationId{it->second.mutation_id};   // Given the key already exists, need to check all cursors in // this Checkpoint and see if the existing item for this key is // to the "left" of the cursor (i.e. has already been // processed). for (auto& cursor : checkpointManager->cursors) { if ((*(cursor.second->currentCheckpoint)).get() == this) { if (cursor.second->name == CheckpointManager::pCursorName) { int64_t cursor_mutation_id = getMutationId(*cursor.second); <------- !! queued_item& cursor_item = *(cursor.second->currentPos); .. Not related to the issue, but just a comment on Checkpoint::getMutationId() and its usage. Essentially in Checkpoint::queueDirty() we need to compare 'it->second.mutation_id' to the seqno of the item currently pointed by each cursor placed in the checkpoint (I'll name it cursor-seqno from now on). For getting cursor-seqno we call Checkpoint::getMutationId(cursor). The logic there is: (1) get the cursor_item_idx and (2) return cursor_item_idx->second.mutation_id. Couldn't we just do 'cursor-seqno = (*(cursor.second->currentPos))->getBySeqno()' in queueDirty() and avoid the call to getMutationId() ? Note that, given how Checkpoint::queueDirty() and Checkpoint::getMutationId() work, then the mutation dumped in logs is just the mutation pointed by the persistence cursor at the time of the failure (ie, not the mutation that the node is processing when we fail).
            paolo.cocchi Paolo Cocchi added a comment - - edited

            While in MB-41283 we hit an issue in the Memory->Disk transition, here we are hitting the following in the opposite Disk->Memory transition (note that Active is 6.0.1 and Replica is 6.6.1):

            1. Active sends a Disk snapshot [X, Y]
            2. Replica receives it successfully, but since MB-35889 (6.6.0) replica doesn't add keys to Checkpoint indexes for Disk snapshots -> no entry for any key in the Disk checkpoint at replica
            3. Active switches to in-memory -> it sends a SnapMarker with the MARKER_FLAG_CHK NOT set (possible in 6.0.1, eg MB-32862) + it starts sending mutation within the new Memory snapshot
            4. Replica receives the SnapMarker for the Memory snapshot and turns the existing Disk checkpoint into a Memory checkpoint
            5. Replica receives mutation for the Memory snapshot and queues those mutation in the Checkpoint that already contains mutations from the original Disk snapshot
            6. At some point Replica receives a mutation for a key that already exists in the Checkpoint -> the deduplication code path above is executed, but we don't have any entry in the index for that key at this point, so we fail

            In the code above we need to compare the mutation-seqno with the cursor-seqno for every cursor in the Checkpoint.
            If we find a cursor that points to a mutation received as part of the original Disk snapshot, then we try to lookup the key of that mutation in the keyIndex. Given that in 6.6.1 for Disk snapshots we don't add keys to the index, then we don't find that key and we throw as seen in this MB.

            Note: I expect this issue to affect also any upgrade from pre-6.5 (ie, 6.0.x, where we had issues like MB-32862) to 6.6.0, as MB-35889 was introduced in 6.6.0.

            paolo.cocchi Paolo Cocchi added a comment - - edited While in MB-41283 we hit an issue in the Memory->Disk transition, here we are hitting the following in the opposite Disk->Memory transition (note that Active is 6.0.1 and Replica is 6.6.1): Active sends a Disk snapshot [X, Y] Replica receives it successfully, but since MB-35889 (6.6.0) replica doesn't add keys to Checkpoint indexes for Disk snapshots -> no entry for any key in the Disk checkpoint at replica Active switches to in-memory -> it sends a SnapMarker with the MARKER_FLAG_CHK NOT set (possible in 6.0.1, eg MB-32862 ) + it starts sending mutation within the new Memory snapshot Replica receives the SnapMarker for the Memory snapshot and turns the existing Disk checkpoint into a Memory checkpoint Replica receives mutation for the Memory snapshot and queues those mutation in the Checkpoint that already contains mutations from the original Disk snapshot At some point Replica receives a mutation for a key that already exists in the Checkpoint -> the deduplication code path above is executed, but we don't have any entry in the index for that key at this point, so we fail In the code above we need to compare the mutation-seqno with the cursor-seqno for every cursor in the Checkpoint. If we find a cursor that points to a mutation received as part of the original Disk snapshot, then we try to lookup the key of that mutation in the keyIndex. Given that in 6.6.1 for Disk snapshots we don't add keys to the index, then we don't find that key and we throw as seen in this MB. Note: I expect this issue to affect also any upgrade from pre-6.5 (ie, 6.0.x, where we had issues like MB-32862 ) to 6.6.0, as MB-35889 was introduced in 6.6.0.
            paolo.cocchi Paolo Cocchi added a comment -

            Test at http://review.couchbase.org/c/kv_engine/+/140532 encodes the failure scenario described in my previous comment. Fix in progress.

            paolo.cocchi Paolo Cocchi added a comment - Test at http://review.couchbase.org/c/kv_engine/+/140532 encodes the failure scenario described in my previous comment. Fix in progress.
            paolo.cocchi Paolo Cocchi added a comment -

            On a separate thread, MB-42805.

            paolo.cocchi Paolo Cocchi added a comment - On a separate thread, MB-42805 .
            drigby Dave Rigby added a comment -

            Thanks for the details.

            Paolo Cocchi One concern I have with reverting MB-35889 - that has already shipped in 6.6.0; so we will have to deal with code which doesn't maintain the KeyIndex for Disk checkpoints in upgrade paths going forward (i.e. 6.6.0 -> newer). What is the concequence of that - i.e. are there any scenarios where during cluster upgrade a 6.6.0 node could have problems?

            drigby Dave Rigby added a comment - Thanks for the details. Paolo Cocchi One concern I have with reverting MB-35889 - that has already shipped in 6.6.0; so we will have to deal with code which doesn't maintain the KeyIndex for Disk checkpoints in upgrade paths going forward (i.e. 6.6.0 -> newer). What is the concequence of that - i.e. are there any scenarios where during cluster upgrade a 6.6.0 node could have problems?
            drigby Dave Rigby added a comment -

            Also - assuming we use the above patch; what's the consequence of just using Item::getbySeqno for the mutation_id - we always wake the flusher to re-persist even if strictly not necessary?

            drigby Dave Rigby added a comment - Also - assuming we use the above patch; what's the consequence of just using Item::getbySeqno for the mutation_id - we always wake the flusher to re-persist even if strictly not necessary?
            paolo.cocchi Paolo Cocchi added a comment -

            Hi Ashwin Govindarajulu, fix submitted, assigning back to you for verification, thanks.

            paolo.cocchi Paolo Cocchi added a comment - Hi Ashwin Govindarajulu , fix submitted, assigning back to you for verification, thanks.

            Build couchbase-server-6.6.1-9193 contains kv_engine commit c4454a5 with commit message:
            MB-42780: Make replica resilient to missing MARKER_FLAG_CHK

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9193 contains kv_engine commit c4454a5 with commit message: MB-42780 : Make replica resilient to missing MARKER_FLAG_CHK

            Build couchbase-server-6.6.1-9194 contains kv_engine commit 675007e with commit message:
            MB-42780: Expand tests and improve comments

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9194 contains kv_engine commit 675007e with commit message: MB-42780 : Expand tests and improve comments

            Not seeing this crash while trying to rebalance_in 6.6.1-9194 node on top of 6.0.1-2037 cluster.

            Closing this ticket.

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Not seeing this crash while trying to rebalance_in 6.6.1-9194 node on top of 6.0.1-2037 cluster. Closing this ticket.

            Sure Richard deMellow. Already started working on that test.

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Sure Richard deMellow . Already started working on that test.

            Build couchbase-server-7.0.0-4069 contains kv_engine commit 675007e with commit message:
            MB-42780: Expand tests and improve comments

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4069 contains kv_engine commit 675007e with commit message: MB-42780 : Expand tests and improve comments

            Build couchbase-server-7.0.0-4069 contains kv_engine commit c4454a5 with commit message:
            MB-42780: Make replica resilient to missing MARKER_FLAG_CHK

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4069 contains kv_engine commit c4454a5 with commit message: MB-42780 : Make replica resilient to missing MARKER_FLAG_CHK

            Build couchbase-server-7.0.0-4127 contains kv_engine commit eba2ffd with commit message:
            MB-42780: Logically revert MB-41283

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4127 contains kv_engine commit eba2ffd with commit message: MB-42780 : Logically revert MB-41283

            Build couchbase-server-7.0.0-4129 contains kv_engine commit cc90dc9 with commit message:
            MB-42780: CM allows extending only Memory checkpoints

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4129 contains kv_engine commit cc90dc9 with commit message: MB-42780 : CM allows extending only Memory checkpoints

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty