The code path in Item::removeUserXattrs() (added in 6.6.0 for
MB-37374) may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.
The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash.
Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.
A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
- IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
- IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.
Item::removeUserXattrs() is when (2) happens.
For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.
If we consider only the SDK and memcached during a transaction then we are fine.
- The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
- memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().
But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload, persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..
- TX completes and creates a committed doc with a given Body
- SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
- The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
- TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
- We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
- A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash
Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.
I have tried to crash the node by removing SyncGateway and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.
For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
|For Gerrit Dashboard: MB-41944|
|137568,7||MB-41944: Improve logging for body-size>0 in Item::removeUserXattrs||mad-hatter||kv_engine||Status: MERGED||+2||+1|
|137837,8||MB-41944: Item::removeUserXattrs() operates on a copy||mad-hatter||kv_engine||Status: MERGED||+2||+1|
|138529,15||Adding functional test for MB-41944||master||TAF||Status: MERGED||+2||+1|
|138990,2||Adding a test for MB-41944||master||transactions-fit-performer||Status: MERGED||+2||+1|
|139034,2||Adding functional test for MB-41944||mad-hatter||TAF||Status: MERGED||+2||+1|
|139884,1||Merge branch mad-hatter into master||master||kv_engine||Status: MERGED||+2||+1|
|139922,3||Merge branch 'couchbase/mad-hatter' into master||master||kv_engine||Status: MERGED||+2||+1|