Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.6.0
-
Untriaged
-
Yes
Description
Whilst reading a single DCP stream (a vbucket) the CAS can go backwards. This will pose a problem for LWW XDCR configurations when performing an active/passive scenario.
The passive (backup) cluster will be out-of-sync with the active if a new mutation (higher seqno than old version) arrives with a CAS that has gone backwards, the new mutation will be rejected, and now the backup is inconsistent with the active cluster. Reading the document on may return a different version for each cluster, the rev-id conflict resolution mode however is protected against this (hence marking this MB as a regression).
There maybe other effects the CAS going backwards has, but this active/passive issue is the most easy to understand.
This bug occurs because ep-engine assigns each mutation a HLC timestamp (CAS) before the mutation is assigned to the checkpoint (when the mutation is assigned the sequence number). Thus n threads updating the same document can be added into the checkpoint manager with the CAS values out-of-order.
This issue has been proven to occur using a modified ep-engine perfsuite test. Using multiple threads, we write to the same document and have a single DCP client thread monitoring the CAS.
Attachments
For Gerrit Dashboard: MB-20798 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
67582,16 | MB-20798: Update queueDirty options in prep for a new option | watson | ep-engine | Status: MERGED | +2 | +1 |
67669,1 | MB-20798: Update queueDirty options in prep for a new option | watson | ep-engine | Status: ABANDONED | 0 | 0 |
67670,16 | MB-20798: Allow CAS and seqno to be generated consistently | watson | ep-engine | Status: MERGED | +2 | +1 |
68750,2 | Merge remote-tracking branch 'couchbase/watson' | master | ep-engine | Status: MERGED | +2 | +1 |