Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46330

[Consistency] Stale reads under SyncWrites may not be sequentially consistent

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • Morpheus
    • 6.5.1, 6.6.0, 6.6.1, 6.6.2, 6.5.2, 6.5.0, 6.6.3, 7.0.0, 7.0.1, 7.0.2, 7.1.0
    • couchbase-bucket
    • None
    • Untriaged
    • 1
    • Unknown

    Description

      Description:

      The current implementation of sync replication is not sequentially consistent as it is possible for an individual process to observe a state prior to one that has already been seen under a network partition as the isolated node can still service read operations and processes can arbitrarily observe stale reads.

      The following hypothetical experiment illustrates the consistency violation:

      Setup:

      Given a 3 node cluster (nodes A,B and C), 2 replicas and two processes P1 and P2 and all writes having durability requirements >= replicate to majority.

      Let 'Open' be the Open Connection operation with the seed node being Node A.

      Let 'Close' be the matching disconnect operation.

      Processes:

      Suppose P1 opens and issues a non-decreasing sequence of writes to some key x in vbucket 0.

      P1: Open W1 W2 W3 ...

      Suppose P2 repeatedly cycles a bunch of read operations sandwiched between a pair of open and close.

      P2: {Open R R R ... Close} Repeat ...

      Network Partition:

      Isolate Node A.

      The majority BC will auto failover the isolated node A, while being unable to communicate this to Node A.

      Consequently, there will be two copies of 'vbucket 0'.

      Possible history for P2 following the partition:

      Let Wn be the last write that went through placing the register in state n.

      Given the cycle of open followed by reads followed by a close:

      1. The connection will be opened.
      2. It may be possible for P2 to observe state n while communicating with node A.
      3. Then observe various monotonically increasing states > n.
      4. The connection will be closed and this process will repeat.

      As this cycle will repeat in the same process, it's possible to observe a state prior one that has already been seen.

      A more concrete example:

      E.g. Let Wn be W5 in the following hypothetical history for P2:

      P2: Open R5 R6 R7 Close Open R5 R8 R9 Close Open R5 R12 R13 Close.

      What's the problem?

      The following requirement from sequential consistency will be violated:

      "However, once a process A has observed some operation from process B, it can never observe a state prior to B."

      (https://jepsen.io/consistency/models/sequential)

      Revisiting CAP:

      Sequential consistency is "Not available during some types of network failures. Some or all nodes must pause operations in order to ensure safety".

      (https://jepsen.io/consistency))

      Furthermore, "Sequential consistency cannot be totally or sticky available; in the event of a network partition, some or all nodes will be unable to make progress".

      (https://jepsen.io/consistency/models/sequential)

      What may be the fix?

      Currently we cannot make writes on the isolated node, but we can still make reads. 

      Currently, my understanding here is that we may also need to 'pause' read operations on the isolated node to "ensure safety".

      Why?

      My guess here is in order to be a CP system we must give up on Availability as hinted by the graph on https://jepsen.io/consistency).

      Is there a workaround?

      A 'process' can have at most 1 pair of open and close operations.

      Consequently it will observe state n initially and then observe various monotonically increasing states > n.

      Design document:

      There is a hidden assumption in the sync writes design document that might suggest that a processes are not logical processes, but instead are the contents between a pair of open and close operations.

      A note consistency as a contract:

      "The data consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, memory will be consistency and the results of reading, writing, or updating memory will be predictable."

      (https://en.wikipedia.org/wiki/Consistency_model)

      This wikipedia article suggests that a programmer has to "follow the rules" in order for the results of various operations to be predictable.

      A temporary alternative solution could perhaps be specifying what the rules are.

      References

      https://jepsen.io/consistency/models/sequential

      https://jepsen.io/consistency

      https://en.wikipedia.org/wiki/Consistency_model

      Sync writes design document

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            daschl Michael Nitschinger
            asad.zaidi Asad Zaidi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty