Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32862

ActiveStream::processItems fails to correctly set CHK flag if last item is checkpoint_start

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      As identified during SyncReplication testing, ActiveStream::processItems() can fail to correctly mark a snapshot as a Checkpoint boundary, if a checkpoint_start item is the last one it processes.

      Context

      DCP Snapshot represents an ordered sequence of mutations / deletions. It doesn't necessarily map 1:1: to a checkpoint, because the ActiveStream may fetch items from the CheckpointManager before the Checkpoint has been closed. For example, consider an open Checkpoint which currently contains seqnos 1..2 (but can still have items added); and a DCP client with a cursor point at the beginning of it:

      Start(1), Mutation("keyA", 1), Mutation("keyB", 2), ...
          ^
          |
      DCP cursor
      

      If an ActiveStream is ready to process data, it will call getAllItemsForCursor which will advance the cursor to the current end of the Checkpoint, and return 3 queued_items:

      • checkpoint_start(1)
      • mutation(1)
      • mutation(2)

      These are handled by processItems and result in the following DCP mutations being added to the readyQ:

      • SnapshotMarker(1,2)
      • Mutation(1)
      • Mutation(2)

      At this point the cursor is pointing after the current Checkpoint contents:

      Start(1), Mutation("keyA", 1), Mutation("keyB", 2),
                                                        ^
                                                        |
                                                     DCP cursor
      

      Now, consider some more items are added to the checkpoint manager and a new Checkpoint added:

      Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2), Start(3),
                                                        ^
                                                        |
                                                     DCP cursor
      

      ActiveStream wants to process more data, calls getAllItemsForCursor and receives the following queued_items:

      • checkpoint_end(2)
      • checkpoint_start(3)

      These are again handled by processItems; and should result in the next SnapshotMarker having the MARKER_FLAG_CHK flag set - to indicate this is the start of a new Checkpoint. However, it cannot yet add a SnapshotMarker to the readyQ because SnapshotMarkers must define a non-empty seqno range - i.e. we cannot create one until at least one Mutation is also ready to process.

      The bug is in this handling - ActiveStream::processItems fails to account for this "pending" SnapshotMarker and ignores it. When ActiveStream::processItems is next called (with one more more mutations) the created Snapshot marker does not have MARKER_FLAG_CHK flag set.

      In the context of normal mutations (non-SyncWrites) this doesn't really matter - while the replica won't create a Checkpoint when it should, there's no significant consequence to this, other than the replica may end up with larger (and more costly to maintain) checkpoints. However for SyncWrites this breaks the model, as we cannot de-duplicate SyncWrite related items (prepare_sync_write / commit_sync_write).

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            drigby Dave Rigby (Inactive)
            drigby Dave Rigby (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty