Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.2.0
-
Untriaged
-
0
-
Unknown
Description
CDC can include duplicates (different version) of a key in a flush batch.
KV has always written flush batches to the KVStore sorted as by key, then by seqno. This has been a key part of the flusher de-duplication since its creation.
input order to flusher:
|
key: b a b a
|
seq: 1 2 3 4
|
|
sorted:
|
key: a a b b
|
seq: 2 1 4 3
|
|
final set written to KVStore:
|
key: a b
|
seq: 2 4
|
With CDC, the final set can now include duplicates, e.g. if the b keys are from a collection with history enabled the output to the KVStore includes two 'b' keys in descending order
final set written to KVStore:
|
key: a b b
|
seq: 2 4 3
|
When processing such a batch, magma assumed it was working in ascending key order, e.g. it may ignore b:4 or assume it's older than b:3... (note that the seqnos are mostly abstract for magma).
This MB tracks this issue.
Note that the simplest way forward in the short-term is for KV to sort the final batch by seqno which will get everything functional.
Note note: couchstore already (attempts to) sort the batch we handover, twice! Each id/seq index and couchstore sorts the flush input by id or seq.