Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32862

ActiveStream::processItems fails to correctly set CHK flag if last item is checkpoint_start

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      As identified during SyncReplication testing, ActiveStream::processItems() can fail to correctly mark a snapshot as a Checkpoint boundary, if a checkpoint_start item is the last one it processes.

      Context

      DCP Snapshot represents an ordered sequence of mutations / deletions. It doesn't necessarily map 1:1: to a checkpoint, because the ActiveStream may fetch items from the CheckpointManager before the Checkpoint has been closed. For example, consider an open Checkpoint which currently contains seqnos 1..2 (but can still have items added); and a DCP client with a cursor point at the beginning of it:

      Start(1), Mutation("keyA", 1), Mutation("keyB", 2), ...
          ^
          |
      DCP cursor
      

      If an ActiveStream is ready to process data, it will call getAllItemsForCursor which will advance the cursor to the current end of the Checkpoint, and return 3 queued_items:

      • checkpoint_start(1)
      • mutation(1)
      • mutation(2)

      These are handled by processItems and result in the following DCP mutations being added to the readyQ:

      • SnapshotMarker(1,2)
      • Mutation(1)
      • Mutation(2)

      At this point the cursor is pointing after the current Checkpoint contents:

      Start(1), Mutation("keyA", 1), Mutation("keyB", 2),
                                                        ^
                                                        |
                                                     DCP cursor
      

      Now, consider some more items are added to the checkpoint manager and a new Checkpoint added:

      Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2), Start(3),
                                                        ^
                                                        |
                                                     DCP cursor
      

      ActiveStream wants to process more data, calls getAllItemsForCursor and receives the following queued_items:

      • checkpoint_end(2)
      • checkpoint_start(3)

      These are again handled by processItems; and should result in the next SnapshotMarker having the MARKER_FLAG_CHK flag set - to indicate this is the start of a new Checkpoint. However, it cannot yet add a SnapshotMarker to the readyQ because SnapshotMarkers must define a non-empty seqno range - i.e. we cannot create one until at least one Mutation is also ready to process.

      The bug is in this handling - ActiveStream::processItems fails to account for this "pending" SnapshotMarker and ignores it. When ActiveStream::processItems is next called (with one more more mutations) the created Snapshot marker does not have MARKER_FLAG_CHK flag set.

      In the context of normal mutations (non-SyncWrites) this doesn't really matter - while the replica won't create a Checkpoint when it should, there's no significant consequence to this, other than the replica may end up with larger (and more costly to maintain) checkpoints. However for SyncWrites this breaks the model, as we cannot de-duplicate SyncWrite related items (prepare_sync_write / commit_sync_write).

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          drigby Dave Rigby created issue -
          drigby Dave Rigby made changes -
          Field Original Value New Value
          Status Open [ 1 ] In Progress [ 3 ]
          drigby Dave Rigby made changes -
          Actual Start 2019-01-30 05:31 (issue has been started)
          drigby Dave Rigby made changes -
          Description As identified during SyncReplication testing, {{ActiveStream::processItems()}} can fail to correctly mark a snapshot as a Checkpoint boundary, if a {{checkpoint_start}} item is the last one it processes.

          +Context+

          DCP Snapshot represents an ordered sequence of mutations / deletions. It doesn't necessarily map 1:1: to a checkpoint, because the ActiveStream may fetch items from the CheckpointManager before the Checkpoint has been closed. For example, consider an open Checkpoint which currently contains seqnos 1..2 (but can still have items added); and a DCP client with a cursor point at the beginning of it:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), ...
              ^
              |
          DCP cursor
          {code}

          If an ActiveStream is ready to process data, it will call {{getAllItemsForCursor}} which will advance the cursor to the _current_ end of the Checkpoint, and return 3 queued_items:

          * checkpoint_start(1)
          * mutation(1)
          * mutation(2)

          These are handled by {{processItems}} and result in the following DCP mutations being added to the readyQ:

          * SnapshotMarker(1,2)
          * Mutation(1)
          * Mutation(2)

          At this point the cursor is pointing after the current Checkpoint contents:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2),
                                                            ^
                                                            |
                                                        DCP cursor
          {code}

          Now, consider some more items are added to the checkpoint manager and a new Checkpoint added:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2), Start(3),
                                                            ^
                                                            |
                                                        DCP cursor
          {code}

          ActiveStream wants to process more data, calls {{getAllItemsForCursor}} and receives the following queued_items:

          * checkpoint_end(2)
          * checkpoint_start(3)

          These are again handled by {{processItems}}; and should result in the _next_ SnapshotMarker having the {{MARKER_FLAG_CHK}} flag set - to indicate this is the start of a new Checkpoint. However, it cannot _yet_ add a SnapshotMarker to the readyQ because SnapshotMarkers must define a non-empty seqno range - i.e. we cannot create one until at least one Mutation is also ready to process.

          The bug is in this handling - {{ActiveStream::processItems}} fails to account for this "pending" SnapshotMarker and ignores it. When {{ActiveStream::processItems}} is next called (with one more more mutations) the created Snapshot marker does _not_ have {{MARKER_FLAG_CHK}} flag set.

          In the context of normal mutations (non-SyncWrites) this doesn't really matter - while the replica won't create a Checkpoint when it should, there's no significant consequence to this, other than the replica may end up with larger (and more costly to maintain) checkpoints. However for SyncWrites this breaks the model, as we cannot de-duplicate SyncWrite related items (prepare_sync_write / commit_sync_write).

          As identified during SyncReplication testing, {{ActiveStream::processItems()}} can fail to correctly mark a snapshot as a Checkpoint boundary, if a {{checkpoint_start}} item is the last one it processes.

          +Context+

          DCP Snapshot represents an ordered sequence of mutations / deletions. It doesn't necessarily map 1:1: to a checkpoint, because the ActiveStream may fetch items from the CheckpointManager before the Checkpoint has been closed. For example, consider an open Checkpoint which currently contains seqnos 1..2 (but can still have items added); and a DCP client with a cursor point at the beginning of it:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), ...
              ^
              |
          DCP cursor
          {code}

          If an ActiveStream is ready to process data, it will call {{getAllItemsForCursor}} which will advance the cursor to the _current_ end of the Checkpoint, and return 3 queued_items:

          * checkpoint_start(1)
          * mutation(1)
          * mutation(2)

          These are handled by {{processItems}} and result in the following DCP mutations being added to the readyQ:

          * SnapshotMarker(1,2)
          * Mutation(1)
          * Mutation(2)

          At this point the cursor is pointing after the current Checkpoint contents:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2),
                                                                    ^
                                                                    |
                                                               DCP cursor
          {code}

          Now, consider some more items are added to the checkpoint manager and a new Checkpoint added:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2), Start(3),
                                                            ^
                                                            |
                                                        DCP cursor
          {code}

          ActiveStream wants to process more data, calls {{getAllItemsForCursor}} and receives the following queued_items:

          * checkpoint_end(2)
          * checkpoint_start(3)

          These are again handled by {{processItems}}; and should result in the _next_ SnapshotMarker having the {{MARKER_FLAG_CHK}} flag set - to indicate this is the start of a new Checkpoint. However, it cannot _yet_ add a SnapshotMarker to the readyQ because SnapshotMarkers must define a non-empty seqno range - i.e. we cannot create one until at least one Mutation is also ready to process.

          The bug is in this handling - {{ActiveStream::processItems}} fails to account for this "pending" SnapshotMarker and ignores it. When {{ActiveStream::processItems}} is next called (with one more more mutations) the created Snapshot marker does _not_ have {{MARKER_FLAG_CHK}} flag set.

          In the context of normal mutations (non-SyncWrites) this doesn't really matter - while the replica won't create a Checkpoint when it should, there's no significant consequence to this, other than the replica may end up with larger (and more costly to maintain) checkpoints. However for SyncWrites this breaks the model, as we cannot de-duplicate SyncWrite related items (prepare_sync_write / commit_sync_write).

          drigby Dave Rigby made changes -
          Description As identified during SyncReplication testing, {{ActiveStream::processItems()}} can fail to correctly mark a snapshot as a Checkpoint boundary, if a {{checkpoint_start}} item is the last one it processes.

          +Context+

          DCP Snapshot represents an ordered sequence of mutations / deletions. It doesn't necessarily map 1:1: to a checkpoint, because the ActiveStream may fetch items from the CheckpointManager before the Checkpoint has been closed. For example, consider an open Checkpoint which currently contains seqnos 1..2 (but can still have items added); and a DCP client with a cursor point at the beginning of it:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), ...
              ^
              |
          DCP cursor
          {code}

          If an ActiveStream is ready to process data, it will call {{getAllItemsForCursor}} which will advance the cursor to the _current_ end of the Checkpoint, and return 3 queued_items:

          * checkpoint_start(1)
          * mutation(1)
          * mutation(2)

          These are handled by {{processItems}} and result in the following DCP mutations being added to the readyQ:

          * SnapshotMarker(1,2)
          * Mutation(1)
          * Mutation(2)

          At this point the cursor is pointing after the current Checkpoint contents:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2),
                                                                    ^
                                                                    |
                                                               DCP cursor
          {code}

          Now, consider some more items are added to the checkpoint manager and a new Checkpoint added:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2), Start(3),
                                                            ^
                                                            |
                                                        DCP cursor
          {code}

          ActiveStream wants to process more data, calls {{getAllItemsForCursor}} and receives the following queued_items:

          * checkpoint_end(2)
          * checkpoint_start(3)

          These are again handled by {{processItems}}; and should result in the _next_ SnapshotMarker having the {{MARKER_FLAG_CHK}} flag set - to indicate this is the start of a new Checkpoint. However, it cannot _yet_ add a SnapshotMarker to the readyQ because SnapshotMarkers must define a non-empty seqno range - i.e. we cannot create one until at least one Mutation is also ready to process.

          The bug is in this handling - {{ActiveStream::processItems}} fails to account for this "pending" SnapshotMarker and ignores it. When {{ActiveStream::processItems}} is next called (with one more more mutations) the created Snapshot marker does _not_ have {{MARKER_FLAG_CHK}} flag set.

          In the context of normal mutations (non-SyncWrites) this doesn't really matter - while the replica won't create a Checkpoint when it should, there's no significant consequence to this, other than the replica may end up with larger (and more costly to maintain) checkpoints. However for SyncWrites this breaks the model, as we cannot de-duplicate SyncWrite related items (prepare_sync_write / commit_sync_write).

          As identified during SyncReplication testing, {{ActiveStream::processItems()}} can fail to correctly mark a snapshot as a Checkpoint boundary, if a {{checkpoint_start}} item is the last one it processes.

          +Context+

          DCP Snapshot represents an ordered sequence of mutations / deletions. It doesn't necessarily map 1:1: to a checkpoint, because the ActiveStream may fetch items from the CheckpointManager before the Checkpoint has been closed. For example, consider an open Checkpoint which currently contains seqnos 1..2 (but can still have items added); and a DCP client with a cursor point at the beginning of it:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), ...
              ^
              |
          DCP cursor
          {code}

          If an ActiveStream is ready to process data, it will call {{getAllItemsForCursor}} which will advance the cursor to the _current_ end of the Checkpoint, and return 3 queued_items:

          * checkpoint_start(1)
          * mutation(1)
          * mutation(2)

          These are handled by {{processItems}} and result in the following DCP mutations being added to the readyQ:

          * SnapshotMarker(1,2)
          * Mutation(1)
          * Mutation(2)

          At this point the cursor is pointing after the current Checkpoint contents:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2),
                                                                    ^
                                                                    |
                                                               DCP cursor
          {code}

          Now, consider some more items are added to the checkpoint manager and a new Checkpoint added:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2), Start(3),
                                                                    ^
                                                                    |
                                                               DCP cursor
          {code}

          ActiveStream wants to process more data, calls {{getAllItemsForCursor}} and receives the following queued_items:

          * checkpoint_end(2)
          * checkpoint_start(3)

          These are again handled by {{processItems}}; and should result in the _next_ SnapshotMarker having the {{MARKER_FLAG_CHK}} flag set - to indicate this is the start of a new Checkpoint. However, it cannot _yet_ add a SnapshotMarker to the readyQ because SnapshotMarkers must define a non-empty seqno range - i.e. we cannot create one until at least one Mutation is also ready to process.

          The bug is in this handling - {{ActiveStream::processItems}} fails to account for this "pending" SnapshotMarker and ignores it. When {{ActiveStream::processItems}} is next called (with one more more mutations) the created Snapshot marker does _not_ have {{MARKER_FLAG_CHK}} flag set.

          In the context of normal mutations (non-SyncWrites) this doesn't really matter - while the replica won't create a Checkpoint when it should, there's no significant consequence to this, other than the replica may end up with larger (and more costly to maintain) checkpoints. However for SyncWrites this breaks the model, as we cannot de-duplicate SyncWrite related items (prepare_sync_write / commit_sync_write).

          drigby Dave Rigby made changes -
          Description As identified during SyncReplication testing, {{ActiveStream::processItems()}} can fail to correctly mark a snapshot as a Checkpoint boundary, if a {{checkpoint_start}} item is the last one it processes.

          +Context+

          DCP Snapshot represents an ordered sequence of mutations / deletions. It doesn't necessarily map 1:1: to a checkpoint, because the ActiveStream may fetch items from the CheckpointManager before the Checkpoint has been closed. For example, consider an open Checkpoint which currently contains seqnos 1..2 (but can still have items added); and a DCP client with a cursor point at the beginning of it:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), ...
              ^
              |
          DCP cursor
          {code}

          If an ActiveStream is ready to process data, it will call {{getAllItemsForCursor}} which will advance the cursor to the _current_ end of the Checkpoint, and return 3 queued_items:

          * checkpoint_start(1)
          * mutation(1)
          * mutation(2)

          These are handled by {{processItems}} and result in the following DCP mutations being added to the readyQ:

          * SnapshotMarker(1,2)
          * Mutation(1)
          * Mutation(2)

          At this point the cursor is pointing after the current Checkpoint contents:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2),
                                                                    ^
                                                                    |
                                                               DCP cursor
          {code}

          Now, consider some more items are added to the checkpoint manager and a new Checkpoint added:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2), Start(3),
                                                                    ^
                                                                    |
                                                               DCP cursor
          {code}

          ActiveStream wants to process more data, calls {{getAllItemsForCursor}} and receives the following queued_items:

          * checkpoint_end(2)
          * checkpoint_start(3)

          These are again handled by {{processItems}}; and should result in the _next_ SnapshotMarker having the {{MARKER_FLAG_CHK}} flag set - to indicate this is the start of a new Checkpoint. However, it cannot _yet_ add a SnapshotMarker to the readyQ because SnapshotMarkers must define a non-empty seqno range - i.e. we cannot create one until at least one Mutation is also ready to process.

          The bug is in this handling - {{ActiveStream::processItems}} fails to account for this "pending" SnapshotMarker and ignores it. When {{ActiveStream::processItems}} is next called (with one more more mutations) the created Snapshot marker does _not_ have {{MARKER_FLAG_CHK}} flag set.

          In the context of normal mutations (non-SyncWrites) this doesn't really matter - while the replica won't create a Checkpoint when it should, there's no significant consequence to this, other than the replica may end up with larger (and more costly to maintain) checkpoints. However for SyncWrites this breaks the model, as we cannot de-duplicate SyncWrite related items (prepare_sync_write / commit_sync_write).

          As identified during SyncReplication testing, {{ActiveStream::processItems()}} can fail to correctly mark a snapshot as a Checkpoint boundary, if a {{checkpoint_start}} item is the last one it processes.

          +Context+

          DCP Snapshot represents an ordered sequence of mutations / deletions. It doesn't necessarily map 1:1: to a checkpoint, because the ActiveStream may fetch items from the CheckpointManager before the Checkpoint has been closed. For example, consider an open Checkpoint which currently contains seqnos 1..2 (but can still have items added); and a DCP client with a cursor point at the beginning of it:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), ...
              ^
              |
          DCP cursor
          {code}

          If an ActiveStream is ready to process data, it will call {{getAllItemsForCursor}} which will advance the cursor to the _current_ end of the Checkpoint, and return 3 queued_items:

          * checkpoint_start(1)
          * mutation(1)
          * mutation(2)

          These are handled by {{processItems}} and result in the following DCP mutations being added to the readyQ:

          * SnapshotMarker(1,2)
          * Mutation(1)
          * Mutation(2)

          At this point the cursor is pointing after the current Checkpoint contents:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2),
                                                            ^
                                                            |
                                                         DCP cursor
          {code}

          Now, consider some more items are added to the checkpoint manager and a new Checkpoint added:

          {code}
          Start(1), Mutation("keyA", 1), Mutation("keyB", 2), End(2), Start(3),
                                                            ^
                                                            |
                                                         DCP cursor
          {code}

          ActiveStream wants to process more data, calls {{getAllItemsForCursor}} and receives the following queued_items:

          * checkpoint_end(2)
          * checkpoint_start(3)

          These are again handled by {{processItems}}; and should result in the _next_ SnapshotMarker having the {{MARKER_FLAG_CHK}} flag set - to indicate this is the start of a new Checkpoint. However, it cannot _yet_ add a SnapshotMarker to the readyQ because SnapshotMarkers must define a non-empty seqno range - i.e. we cannot create one until at least one Mutation is also ready to process.

          The bug is in this handling - {{ActiveStream::processItems}} fails to account for this "pending" SnapshotMarker and ignores it. When {{ActiveStream::processItems}} is next called (with one more more mutations) the created Snapshot marker does _not_ have {{MARKER_FLAG_CHK}} flag set.

          In the context of normal mutations (non-SyncWrites) this doesn't really matter - while the replica won't create a Checkpoint when it should, there's no significant consequence to this, other than the replica may end up with larger (and more costly to maintain) checkpoints. However for SyncWrites this breaks the model, as we cannot de-duplicate SyncWrite related items (prepare_sync_write / commit_sync_write).

          Build couchbase-server-6.5.0-2241 contains kv_engine commit 79dfc55 with commit message:
          MB-32862: Correctly encode DCP SnapMarker if last item processed is chk_start

          build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-2241 contains kv_engine commit 79dfc55 with commit message: MB-32862 : Correctly encode DCP SnapMarker if last item processed is chk_start
          drigby Dave Rigby made changes -
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Resolved [ 5 ]
          drigby Dave Rigby made changes -
          Actual End 2019-02-07 10:21 (issue has been resolved)
          ritam.sharma Ritam Sharma made changes -
          Labels request-dev-verify
          paolo.cocchi Paolo Cocchi made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

          People

            drigby Dave Rigby
            drigby Dave Rigby
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty