Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
6.6.0, 6.5.0
-
Untriaged
-
0
-
Unknown
Description
Issue noted during code review whilst developing another feature - theoretical, unproven etc...
The max visible seqno (MVS) and high-completed seqno (HCS) are special 'markers' tracked per vbucket for different purposes. For example MVS denotes the highest committed seqno, e.g. a prepare or abort will not 'increment' this counter. This is used so that for example a DCP client can use getAllVBSeqnos which will return the MVS - then the DCP client (when not enabling prepare/abort) doesn't 'miss' events on their stream.
The MVS/HCS are replicated in snapshot markers and this is where this issue could occur.
If a cluster was building a new replica, it will send a disk snapshot from active to replica, the marker which is sent ahead of the data will include MVS/HCS, the replica vbucket will read these and place them into VBucket objects - then the snapshot items are transmitted.
Next we could imagine that a new problem occurs and dataloss is accepted, the replica which is being built is forced to become active. We could assume it has received 1 mutation, yet the MVS and HCS are millions of seqnos higher....
I cannot see that there is any recovery or protection in place, we would of course be in a bad place because of the accepted dataloss, but the vbucket state is now out of sync. E.g. getAllVBSeqs will report the huge MVS, yet a new DCP stream will never reach such a seqno.