Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34873

Prepares from a disk snapshot may dedupe higher durability level prepares (2/3) [ETA 2019/7/12]

    XMLWordPrintable

Details

    • Triaged
    • No
    • KV-Engine Mad-Hatter Beta

    Description

      Problem:

      1. Replicas are not supposed to ack a seqno higher than that of a received Persist level prepare until that prepare is persisted.
      2. Disk snapshots may dedupe prepares; a Majority level Prepare may dedupe a Persist level prepare.
      3. A replica receiving a disk snapshot does not know if there were Persist level prepares that have been deduped.
      4. Acking a seqno tells the active that all Prepares of seqno<=ackSeqno have met their durability requirements locally on the replica.

      If a replica receives a Majority level Prepare from a disk snapshot, it is potentially incorrect to ack that seqno - there may be a previous Persist level Prepare that was deduped, and we might not have not yet persisted the appropriate value for that key. To wit, we have "jumped" the durability fence.

       

      To clarify with a scenario:

      If the active receives the following ops (for one key)

      PRE(Persist):1 CMT:2 PRE(Majority):3

      The replica will see instead

      SET:2 PRE(Majority):3 

      (NB: Set sent instead because of MB-34789)

      The replica would ack seqno 3 at the snapshot end, without regard to whether the SET or PRE have been persisted (because Majority level Prepares are immediately satisfied locally on a replica, because they are in memory which is all that is needed).

      This state is unacceptable; if the active fails over this replica may be promoted if it has seqno acked at least as far as any other replicas. As the new active, if it dies and comes back up, we have lost the correct value for that key; it was not persisted to disk when we acked. Therefore, we may have broken the durability contract if we reported SUCCESS to the client after committing the prepared value.

       

      Solution:

      By effectively "promoting" all prepares received during a disk snapshot to Persist level we can ensure we will not implicitly acknowledge any deduped Persist level Prepares before their value has been persisted - once the later "promoted" prepare is persisted, we know the preceding SET has been persisted also.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              james.harrison James Harrison (Inactive)
              james.harrison James Harrison (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty