Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
Untriaged
-
-
1
-
Yes
-
KV 2022-Feb
Description
Summary
In some situations the KV-Engine's DCP producer (ActiveStream) could send a DCP Snapshot with a start seqno that was less than the previous snapshot's start seqno. This is not expected in the protocol, and caused GSI (and potentially other DCP consumers) to receive snapshots which cannot be correctly resumed from.
Build : 7.1.0-2126
Test : -test tests/2i/neo/test_neo_idx_clusterops_recovery.yml -scope tests/2i/neo/scope_neo_moi_idx.yml
Scale : 2
Iteration : 1st
On nodes 172.23.97.215, 172.23.97.232, 172.23.97.235, 172.23.97.237, following type of fatal msgs were seen in the projector logs.
2022-01-21T17:13:10.617-08:00 [Fatal] DCPT[secidx:proj-bucket1-MAINT_STREAM_TOPIC_e7ad3fe57782bb3d8ca3e13a65bada1f-2554665142456261852/1] ##14 seq order violation for snapshot message for vb = 69, opcode = DCP_SNAPSHOT, orderState = {snapStart: 4438, snapEnd 4438, snapStarted true, prevSeq: 4438, prevSeqValid: true, errCount: 1}, event = Opcode DCP_SNAPSHOT, Status SUCCESS, Datatype 0, VBucket 69, Opaque 20, VBuuid 28220110182854, Key <ud>()</ud>, Cas 0, Seqno 4403, RevSeqno 0, Flags 0, Expiry 0, LockTime 0, Nru 0, SnapstartSeq 4403, SnapendSeq 4441, SnapshotType 5, FailoverLog <nil>, Error <nil>, Ctime 1642813990617012350
|
2022-01-21T17:13:10.617-08:00 [Fatal] DCPT[secidx:proj-bucket1-MAINT_STREAM_TOPIC_e7ad3fe57782bb3d8ca3e13a65bada1f-2554665142456261852/1] ##14 seq order violation for snapshot message for vb = 105, opcode = DCP_SNAPSHOT, orderState = {snapStart: 4431, snapEnd 4431, snapStarted true, prevSeq: 4431, prevSeqValid: true, errCount: 0}, event = Opcode DCP_SNAPSHOT, Status SUCCESS, Datatype 0, VBucket 105, Opaque 20, VBuuid 100272905958714, Key <ud>()</ud>, Cas 0, Seqno 4399, RevSeqno 0, Flags 0, Expiry 0, LockTime 0, Nru 0, SnapstartSeq 4399, SnapendSeq 4432, SnapshotType 5, FailoverLog <nil>, Error <nil>, Ctime 1642813990617085162
|
2022-01-21T17:13:10.619-08:00 [Fatal] ENDP[<-(172.23.107.3:9105,dd86)<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_e7ad3fe57782bb3d8ca3e13a65bada1f] seq order violation for snapshot message for vb = bucket1:69, command = 8, orderState = {snapStart: 4438, snapEnd 4438, snapStarted true, prevSeq: 4438, prevSeqValid: true, errCount: 1}, snapStart: 4403, snapEnd: 4441, mutation = Docidx <ud>()</ud>, Seqno 4438, Ctime 0, Uuids [5], Commands [8]
|
2022-01-21T17:13:10.619-08:00 [Fatal] ENDP[<-(172.23.107.3:9105,dd86)<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_e7ad3fe57782bb3d8ca3e13a65bada1f] seq order violation for snapshot message for vb = bucket1:105, command = 8, orderState = {snapStart: 4431, snapEnd 4431, snapStarted true, prevSeq: 4431, prevSeqValid: true, errCount: 0}, snapStart: 4399, snapEnd: 4432, mutation = Docidx <ud>()</ud>, Seqno 4431, Ctime 0, Uuids [5], Commands [8]
|
This issue is similar to the ones fixed previously, like MB-49453, MB-47753 and MB-46466.
This seems to be regression. Last time the GSI component test was run with build 7.1.0-2079. That run was with Plasma storage, this is with MOI.
Attachments
Issue Links
- relates to
-
MB-51105 [System Test] Caught unhandled std::exception-derived exception. what(): Monotonic<m> invariant failed: new value (7079) breaks invariant on current value (7094)
- Closed