Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35003

flushVBucket: Only update last_snap_start upon complete snapshot

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.5.0
    • 6.5.0
    • couchbase-bucket
    • None
    • Triaged
    • No
    • KV-Engine MH Beta part 2, KV-Engine MH 2nd Beta

    Description

      Background

      The `vbstate.last_snap_start` field represents the last consistent sequence number the vBucket has persisted up to. This is used in KVBucket::setVBucketState_UNLOCKED() to determine where to place the failover branch point upon promotion from active -> replica.

      However the correctness of the last_snap_start (on the replica) currently relies on only having one incomplete Checkpoint present, as it is defined as the start seqno of the Checkpoint being flushed.

      For example, consider a replica which recieved the following (incomplete) snapshots:

          - SNAPSHOT_MARKER(1,2, flags=MEM)
          - 1:SET
          <disconnect, reconnect>
          - SNAPSHOT_MARKER(3,4, flags=MEM)
          - 3:SET
      

      (Note that Snapshot (1,2) is incomplete as 2:SET was not sent before disconnecting, and Snapshot(3,4) is incomplete as 4:SET hasn't been received yet.)

      In the above example, these snapshot markers are processed by the replica to create the following Checkpoint(s):

          Recv SNAPSHOT_MARKER(1,2) -> create new checkpoint:
              Checkpoint[1,2]
          Recv SNAPSHOT_MARKER(3,4  -> extend current checkpoint:
              Checkpoint[1,4]
      

      As such, when flushing on replica the last_snap_start is correctly calculated as the start of the current Checkpoint - seqno:1. This is correct because even though we are now persisting Snapshot (3,4), the previous Snapshot(1,2) was incomplete and hence we cannot advance the last_snap_start.

      Problem

      SyncReplication (and specifically MB-35001) requires that no duplicate Prepare or Commit items exist within a Checkpoint. This cannot be guaranteed with the current (pre MB-35001) Checkpoint extending behaviour - see MB-35001 for details of that scenario.

      The solution to that MB is to create additional checkpoints instead of extending the current checkpoint. However, if mutations are placed in separate Checkpoints (to avoid
      duplicate PREs) then the last_snap_start could be calculated incorrectly - given there's now two checkpoints (1,3) and (4,5); once the flusher begins to flush (4,5); it will result in last_snap_start being updated to 4, even if Checkpoint(1,3) was incomplete (as per the
      original non-SW scenario).

      Solution?

      Adjust the definition of the last_snap_start - advance it only when a consistent Checkpoint has been flushed.

      However to achieve this we need to change the state available to the flusher, so the last completed Checkpoint start can be tracked, and updated (written to disk) when the end of the checkpoint is flushed.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-35003
          # Subject Branch Project Status CR V

          Activity

            People

              drigby Dave Rigby (Inactive)
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty