Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.5.0
-
Untriaged
-
No
-
KV-Engine Mad-Hatter Beta
Description
Summary
During rebalance the following error is seen on the incoming replica - a duplicate item is seen in a Checkpoint:
2019-07-04T12:13:07.145526+01:00 ERROR 55: exception occurred in runloop during packet execution. Cookie info: [{"aiostat":"success","connection":"[ 127.0.0.1:65383 - 127.0.0.1:11995 (<ud>@ns_server</ud>) ]","engine_storage":"0x00000001067af018","ewouldblock":false,"packet":{"bodylen":154,"cas":1562238787022946304,"datatype":"raw","extlen":33,"keylen":21,"magic":"ClientRequest","opaque":25,"opcode":"DCP_PREPARE","vbucket":3},"refcount":1}] - closing connection ([ 127.0.0.1:65383 - 127.0.0.1:11995 (<ud>@ns_server</ud>) ]):
|
CheckpointManager::queueDirty(vb:3) - got Ckpt::queueDirty() status:failure:duplicate item when vbstate is non-active:3
|
After local reproduction, it seems like the following scenario is causing this error. The active node has the following items on disk and in memory (checkpoint manager):
Disk:
|
1:PRE(a), 2:CMT(a), 3:SET(b)
|
|
Memory:
|
3:CKPT_START
|
3:SET(b), 4:PRE(a), 5:SET(c)
|
(Items 1..2 were in a closed, removed checkpoint and no longer in-memory.)
An ep-engine replica attempting to stream all of this (0..infinity) will result in a backfill of items 1..3, with a checkpoint cursor being placed at seqno:4. Note this isn't the start of the Checkpoint (which is 3) and hence not pointing at a checkpoint_start item. As such when this is streamed over DCP (up to seqno:4) the consumer will see (note the flags sent):
SNAPSHOT_MARKER(start=1, end=3, flags=DISK|CKPT)
|
1:PRE(a)
|
2:CMT(a)
|
3:SET(b)
|
SNAPSHOT_MARKER(start=4, end=5, flags=MEM)
|
4:PRE(a),
|
[[[missing seqno 5]]
|
If the consumer puts all of these mutations in the same Checkpoint, then it will result in duplicate PRE(a) items (which breaks Checkpoint invariant).
Steps to Reproduce
Exact steps tbc, but seen when rebalancing in a node while modifying the same key(s) - this should result in an initial Disk snapshot with some Key being prepared in it, followed by a Memory snapshot which also has the same Key being prepared.
Expected Results
It shouldn't crash - the subsequent "duplicate" Prepare should be accepted by the replica.
Actual Results
Above crash seen.