Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.6.4, 5.0.1, 5.5.0
-
None
-
Triaged
-
Unknown
Description
When the rev-seqno is stored to disk, it is actually stored as a 48-bit value. Under normal operations this wouldn't pose a problem, however if the upper 16-bits of the in-memory rev-seqno became 'dirty' problems can then occur, one problem observed is an indefinite xdcr loop.
In the observed problem at some point in time some event corrupted the rev-seqno (presume a set or del-with-meta) occurred and had a corrupt 'extras' containing a rev-seqno such as 0x0080.xxxx.yyyy.zzzz.
The node receiving such a value will persist this rev-seqno as the max-deleted seqno in the vbucket's _local document as the full 64-bit value. When the mutation is stored to couchstore the rev-seqno will be stored as 0xxxx.yyyy.zzzz
Now in bi-directional XDCR the mutation will go through DCP and possibly back to the source node, but DCP will send the 64-bit 0x0080 prefixed value.
When that value lands on the other side and the key was ejected, we have fetch the key from disk which brings the 48-bit value into memory ready for conflict resolution.
- 0x0080.xxxx.yyyy.zzzz > 0x0000.xxxx.yyyy.zzzz
- The incoming mutation wins and gets stored again
The node will now store the truncated rev-seqno and tell the local XDCR of the new mutation via DCP with the full 64-bit rev-seqno, which will be sent back to the other cluster.
If the other cluster has evicted the key we have to fetch... a loop occurs.
EP-engine in memory must treat rev-seqno as 48-bits and mask off the upper 16-bits of rev-seqno from set/del-with-meta and when reading the _local document max-deleted seqno.
As a bonus, we can save 2-bytes of meta-data per in-memory Item/StoredValue
Attachments
Issue Links
- relates to
-
MB-29119 RevSeqno is really 48-bits, but is stored in memory as 64-bits
- Closed