Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50543

[System Test] Seeing occurrences of "seq order violation" in projector logs for MAINT_STREAM

    XMLWordPrintable

Details

    Description

      Summary

      In some situations the KV-Engine's DCP producer (ActiveStream) could send a DCP Snapshot with a start seqno that was less than the previous snapshot's start seqno. This is not expected in the protocol, and caused GSI (and potentially other DCP consumers) to receive snapshots which cannot be correctly resumed from.

      Build : 7.1.0-2126
      Test : -test tests/2i/neo/test_neo_idx_clusterops_recovery.yml -scope tests/2i/neo/scope_neo_moi_idx.yml
      Scale : 2
      Iteration : 1st

      On nodes 172.23.97.215, 172.23.97.232, 172.23.97.235, 172.23.97.237, following type of fatal msgs were seen in the projector logs.

      2022-01-21T17:13:10.617-08:00 [Fatal] DCPT[secidx:proj-bucket1-MAINT_STREAM_TOPIC_e7ad3fe57782bb3d8ca3e13a65bada1f-2554665142456261852/1] ##14 seq order violation for snapshot message for vb = 69, opcode = DCP_SNAPSHOT, orderState = {snapStart: 4438, snapEnd 4438, snapStarted true, prevSeq: 4438, prevSeqValid: true, errCount: 1}, event = Opcode DCP_SNAPSHOT, Status SUCCESS, Datatype 0, VBucket 69, Opaque 20, VBuuid 28220110182854, Key <ud>()</ud>, Cas 0, Seqno 4403, RevSeqno 0, Flags 0, Expiry 0, LockTime 0, Nru 0, SnapstartSeq 4403, SnapendSeq 4441, SnapshotType 5, FailoverLog <nil>, Error <nil>, Ctime 1642813990617012350
      2022-01-21T17:13:10.617-08:00 [Fatal] DCPT[secidx:proj-bucket1-MAINT_STREAM_TOPIC_e7ad3fe57782bb3d8ca3e13a65bada1f-2554665142456261852/1] ##14 seq order violation for snapshot message for vb = 105, opcode = DCP_SNAPSHOT, orderState = {snapStart: 4431, snapEnd 4431, snapStarted true, prevSeq: 4431, prevSeqValid: true, errCount: 0}, event = Opcode DCP_SNAPSHOT, Status SUCCESS, Datatype 0, VBucket 105, Opaque 20, VBuuid 100272905958714, Key <ud>()</ud>, Cas 0, Seqno 4399, RevSeqno 0, Flags 0, Expiry 0, LockTime 0, Nru 0, SnapstartSeq 4399, SnapendSeq 4432, SnapshotType 5, FailoverLog <nil>, Error <nil>, Ctime 1642813990617085162
      2022-01-21T17:13:10.619-08:00 [Fatal] ENDP[<-(172.23.107.3:9105,dd86)<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_e7ad3fe57782bb3d8ca3e13a65bada1f] seq order violation for snapshot message for vb = bucket1:69, command = 8, orderState = {snapStart: 4438, snapEnd 4438, snapStarted true, prevSeq: 4438, prevSeqValid: true, errCount: 1}, snapStart: 4403, snapEnd: 4441, mutation = Docidx <ud>()</ud>, Seqno 4438, Ctime 0, Uuids [5], Commands [8]
      2022-01-21T17:13:10.619-08:00 [Fatal] ENDP[<-(172.23.107.3:9105,dd86)<-127.0.0.1:8091 #MAINT_STREAM_TOPIC_e7ad3fe57782bb3d8ca3e13a65bada1f] seq order violation for snapshot message for vb = bucket1:105, command = 8, orderState = {snapStart: 4431, snapEnd 4431, snapStarted true, prevSeq: 4431, prevSeqValid: true, errCount: 0}, snapStart: 4399, snapEnd: 4432, mutation = Docidx <ud>()</ud>, Seqno 4431, Ctime 0, Uuids [5], Commands [8]
      

      This issue is similar to the ones fixed previously, like MB-49453, MB-47753 and MB-46466.

      This seems to be regression. Last time the GSI component test was run with build 7.1.0-2079. That run was with Plasma storage, this is with MOI.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty