Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.6.4, 5.0.1, 5.5.0
-
None
-
Triaged
-
Unknown
Description
When the rev-seqno is stored to disk, it is actually stored as a 48-bit value. Under normal operations this wouldn't pose a problem, however if the upper 16-bits of the in-memory rev-seqno became 'dirty' problems can then occur, one problem observed is an indefinite xdcr loop.
In the observed problem at some point in time some event corrupted the rev-seqno (presume a set or del-with-meta) occurred and had a corrupt 'extras' containing a rev-seqno such as 0x0080.xxxx.yyyy.zzzz.
The node receiving such a value will persist this rev-seqno as the max-deleted seqno in the vbucket's _local document as the full 64-bit value. When the mutation is stored to couchstore the rev-seqno will be stored as 0xxxx.yyyy.zzzz
Now in bi-directional XDCR the mutation will go through DCP and possibly back to the source node, but DCP will send the 64-bit 0x0080 prefixed value.
When that value lands on the other side and the key was ejected, we have fetch the key from disk which brings the 48-bit value into memory ready for conflict resolution.
- 0x0080.xxxx.yyyy.zzzz > 0x0000.xxxx.yyyy.zzzz
- The incoming mutation wins and gets stored again
The node will now store the truncated rev-seqno and tell the local XDCR of the new mutation via DCP with the full 64-bit rev-seqno, which will be sent back to the other cluster.
If the other cluster has evicted the key we have to fetch... a loop occurs.
EP-engine in memory must treat rev-seqno as 48-bits and mask off the upper 16-bits of rev-seqno from set/del-with-meta and when reading the _local document max-deleted seqno.
As a bonus, we can save 2-bytes of meta-data per in-memory Item/StoredValue
Attachments
Issue Links
- relates to
-
MB-29531 [Backport MB-29119] - RevSeqno is really 48-bits, but is stored in memory as 64-bits
- Closed
For Gerrit Dashboard: MB-29119 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
92188,7 | MB-29119: N-byte integer class | master | platform | Status: ABANDONED | -2 | +1 |
92190,4 | MB-29119: Replace revSeqno with a 48-bit counter | master | kv_engine | Status: ABANDONED | +1 | -1 |
92408,2 | MB-29119: Replace revSeqno with a 48-bit counter | spock | kv_engine | Status: ABANDONED | 0 | -1 |
92482,1 | MB-29119: N-byte integer class | spock | platform | Status: ABANDONED | 0 | -1 |
92483,1 | MB-29119: Replace revSeqno with a 48-bit counter | spock | kv_engine | Status: ABANDONED | 0 | 0 |
92484,6 | MB-29119: N-byte integer class | spock | platform | Status: MERGED | +2 | +1 |
92485,7 | MB-29119: Replace revSeqno with a 48-bit counter | spock | kv_engine | Status: MERGED | +2 | +1 |
92554,3 | Merge couchbase/spock into couchbase/master | master | platform | Status: MERGED | +2 | +1 |
92556,1 | Merge couchbase/spock to couchbase/master | master | kv_engine | Status: ABANDONED | 0 | -1 |
92617,1 | Merge couchbase/spock into couchbase/master | master | kv_engine | Status: MERGED | +2 | +1 |
93642,2 | CBQE-4614 add test for issue MB-29119 | master | testrunner | Status: MERGED | +2 | +1 |
93699,2 | CBQE-4614 add test for issue MB-29119 | spock | testrunner | Status: MERGED | +2 | +1 |
94311,3 | CBQE-4614 add test for issue MB-29119 | watson | testrunner | Status: MERGED | +2 | +1 |
111175,1 | Reverse this file so it could test issue in MB-29119 | master | testrunner | Status: ABANDONED | 0 | +1 |
111190,2 | revert setWithMeta and doMetaCmd back so it could test MB-29119 | master | testrunner | Status: MERGED | +2 | +1 |