Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 6.6.6, 7.0.4
Affects Version/s: 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5, 7.0.0, 7.0.1, 7.0.2, 7.0.3
Component/s: couchbase-bucket
Labels:

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Unknown
Sprint:
KV 2022-Feb, KV March-22

Description

Summary

If a replica vBucket is promoted to active, and the last DCP message it received was a Snapshot Marker which had the first mutation de-duplicated, then the snapshot start of the newly-promoted active ends up greater than the active.
Upon the next Flusher run (i.e. next mutation to the vBucket), the Flusher throws an exception when trying to fetch items which terminates KV-Engine (as exception is thrown on BG thread).

Confirmed to affect 6.6.1 and 6.6.5.
Confirmed to affect 7.0.0 - 7.0.3 (inclusive) - see https://review.couchbase.org/c/kv_engine/+/170373
Confirmed to not affect Neo.

Details

When streaming data from an Active to Replica vBucket, the extent of the Checkpoint is sent via DCP using a SnapshotMarker message, followed by N Mutation / Deletion messages. The snapshot marker may be discontinuous compared to the previous if any de-duplication occurred within the Checkpoint - for example if document "key" was written sufficient times in quick succession, one could end up with the following two Checkpoints on the active and subsequent DCP SnapshotMarker sent to the replica:

CheckpointManager[0x108a03080] with numItems:6 checkpoints:2

    Checkpoint[0x10891f000] with id:2 seqno:{1,10} snap:{0,10, visible:10} state:CHECKPOINT_CLOSED numCursors:1 type:Memory hcs:-- items:[

	{1,empty,cid:0x1:empty,110,[m]}

	{1,checkpoint_start,cid:0x1:checkpoint_start,121,[m]}

	{1,set_vbucket_state,cid:0x1:set_vbucket_state,245,[m]}

	{10,mutation,cid:0x0: deduplicated_key,119,}

	{11,checkpoint_end,cid:0x1:checkpoint_end,119,[m]}

    Checkpoint[0x10891fa00] with id:3 seqno:{11,12} snap:{10,12, visible:12} state:CHECKPOINT_OPEN numCursors:1 type:Memory hcs:-- items:[

	{11,empty,cid:0x1:empty,110,[m]}

	{11,checkpoint_start,cid:0x1:checkpoint_start,121,[m]}

	{12,mutation,cid:0x0:deduplicated_key,130,}

Note how there are just two mutations remaining (at seqnos 10 and 12), and that there is a seqno "gap" at 11 (ignore meta-items which are not send over DCP).

When this is replicated over DCP it will be sent as:

DCP_SNAPSHOT_MARKER(start:0, end:10, flags=CHK)
DCP_MUTATION(seqno:10, ...)
DCP_SNAPSHOT_MARKER(start:12, end:12, flags=CHK)
DCP_MUTATION(seqno:12, ...)

Once these messages are replicated over DCP the replica vBucket should have equivalent state as the active.

However; if the last DCP_MUTATION is not received - for example if the active node is being failed over and the stream is closed before the DCP_MUTATION, then the state of the replica - crucially the Open checkpoint is as follows:

    Checkpoint[0x10cecde00] with id:2 seqno:{11,11} snap:{12,12, visible:12} state:CHECKPOINT_OPEN numCursors:0 type:Memory hcs:-- items:[

	{11,empty,cid:0x1:empty,110,[m]}

	{11,checkpoint_start,cid:0x1:checkpoint_start,121,[m]}

Note that the second SnapshotMarker being flagged as "CHK" (Checkpoint) is essential - we need the replica to end up creating a new Checkpoint with the start and end controlled by the active - a SnapshotMarker without that flag is insufficient as it just extends the existing checkpoint, increasing the checkpoint end but leaving start unaffected.

When this sequence occurs, the seqno range (11,11) in the open Checkpoint is less than the snapshot range (12,12). This is problematic as we have essentially broken an invariant on Checkpoints - that all items within them are between the snapshot start and end.

This doesn't immediately cause a problem, but if this vBucket is converted to Active and starts accepting mutations itself, it will start generating seqnos from the last seqno received - 10 in this case. This results in the next mutation being assigned seqno 11, which when the flusher is woken and attempts to flush throws an exception on the BG thread and crashes KV-Engine:

CheckpointManager::queueDirty: lastBySeqno not in snapshot range. vb:0 state:active snapshotStart:12 lastBySeqno:11 snapshotEnd:11 genSeqno:Yes checkpointList.size():2

Impact Assessment

In theory this scenario seems reasonably easy to hit - one just needs at least one seqno being de-duplicated (i.e. two mutations to the same key within the same Checkpoint), and then a failover where just the SnapshotMarker is received without any of the mutations it contains. However in practice I have not been able to trigger it on a full running cluster with default config values (yet).

This is most likely due to the requirement that the SnapshotMarker received last (before stream is closed) must have the "CHK" flag set - i.e. must represent a newly-created Checkpoint. In 6.6.1 a new Checkpoint is only created under certain criteria (see CheckpointManager::isCheckpointCreationForHighMemUsage_UNLOCKED and CheckpointManager::checkOpenCheckpoint_UNLOCKED) - but the main ones are one of:

The current Checkpoint has 10,000 items in it, or
The current Checkpoint has at least 1 item and 5 seconds have passed since it was created.

These criteria somewhat work against the needed scenario to trigger the bug:

To create a new Checkpoint when there is a large (10,000) items in it, it becomes increasingly unlikely that de-duplication would exactly de-duplicate the first mutation(s) in the Checkpoint.
To create a new Checkpoint after 5 seconds; but also have de-duplication requires modification of the same key in a small window, but overall little-to-no other traffic.

However the problem has been observed on a live running cluster; where it affected 4 out of ~140 vBuckets when a node was failed over, so clearly it is possible to hit, given the "right" environment.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cb_subdoc_dict_add_loop.py
2 kB
15/Feb/22 5:34 AM

Issue Links

is duplicated by

MB-53285 Caught unhandled std::exception-derived exception. what(): snapshot_range_t(5262,5260) requires start <= end

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-50874
#	Subject	Branch	Project	Status	CR	V
170870,3	MB-50874: Single key tests to create env for hitting	master	TAF	Status: NEW	0	+1
170268,6	MB-50874: Reset snap start if less than lastSeqno on new checkpoint creation	mad-hatter	kv_engine	Status: MERGED	+2	+1
170370,1	MB-50874: Replica to active with de-duped checkpoint [mad-hatter]	mad-hatter	kv_engine	Status: ABANDONED	0	-1
170371,2	MB-50874: Replica to active with de-duped checkpoint [master]	master	kv_engine	Status: ABANDONED	-1	-1
170373,1	MB-50874: Replica to active with de-duped checkpoint [cheshire-cat]	cheshire-cat	kv_engine	Status: ABANDONED	-1	-1
170454,5	MB-50874: Replica to active with de-duped checkpoint [master]	master	kv_engine	Status: ABANDONED	0	-1
170789,4	MB-50874: Merge branch 'mad-hatter' into cheshire-cat	cheshire-cat	kv_engine	Status: MERGED	+2	+1
171274,1	Merge commit 'bfa0dd847' into master	master	kv_engine	Status: ABANDONED	-1	-1
171305,3	Merge commit 'bfa0dd847' into master	master	kv_engine	Status: MERGED	+2	+1
172486,2	MB-51513: Merge 'couchbase/cheshire-cat' into 'couchbase/neo'	neo	kv_engine	Status: MERGED	+2	+1
172641,1	Merge branch 'neo' into master	master	kv_engine	Status: MERGED	+2	+1