Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41944

Item::removeUserXattrs doesn't take a copy of the blob

    XMLWordPrintable

Details

    • Triaged
    • 1
    • Yes

    Description

      Summary

      The code path in Item::removeUserXattrs() (added in 6.6.0 for MB-37374) may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

      The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash.

      Details

      Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
      From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

      A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
      For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
      If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:

      1. IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
      2. IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

      Item::removeUserXattrs() is when (2) happens.
      For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

      What do we need for hitting the issue?

      If we consider only the SDK and memcached during a transaction then we are fine.

      1. The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
      2. memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

      But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
      At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload, persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

       

      Why exactly we may see a crash?

      1. TX completes and creates a committed doc with a given Body
      2. SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
      3. The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
      4. TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
      5. We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
      6. A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

       

      Do we strictly need SyncGateway for hitting a crash?

      Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

       

      If we remove SysXattrs from the described scenario, can we still hit any issue?

      I have tried to crash the node by removing SyncGateway and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

       

      Do we have a workaround?

      For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
      Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            paolo.cocchi Paolo Cocchi created issue -
            paolo.cocchi Paolo Cocchi made changes -
            Field Original Value New Value
            Link This issue is triggered by CBSE-8993 [ CBSE-8993 ]
            owend Daniel Owen made changes -
            Is this a Regression? Unknown [ 10452 ] No [ 10451 ]
            drigby Dave Rigby made changes -
            Is this a Regression? No [ 10451 ] Yes [ 10450 ]
            paolo.cocchi Paolo Cocchi made changes -
            Attachment sg_config_file.json [ 111088 ]
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issue including crashes, DCP disconnection, corrupet data being written to disk, corrupted data being streamed via DCP.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (tnj 125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Do we strictly need SyncGateway for hitting the issue?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persisted. To date only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issue including crashes, DCP disconnection, corrupet data being written to disk, corrupted data being streamed via DCP.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (tnj 125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Do we strictly need SyncGateway for hitting the issue?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persisted. To date only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issue including crashes, DCP disconnection, corrupet data being written to disk, corrupted data being streamed via DCP.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (tnj 125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Do we strictly need SyncGateway for hitting the issue?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issue including crashes, DCP disconnection, corrupet data being written to disk, corrupted data being streamed via DCP.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (tnj 125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Do we strictly need SyncGateway for hitting the issue?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (tnj 125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Do we strictly need SyncGateway for hitting the issue?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (tnj 125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Do we strictly need SyncGateway for hitting the issue?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Do we strictly need SyncGateway for hitting the issue?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Do we strictly need SyncGateway for hitting the issue?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Do we strictly need SyncGateway for hitting the issue?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Do we strictly need SyncGateway for hitting the issue?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+
             Under assessment.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+
             Under assessment.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+
             I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node.
             I need to repeat the same test and debug for a full assessment of how our code behaves in that case. For now I didn't manage to get any corruption on disk either.
             *Still under assessment.*

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+
             I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node.
             I need to repeat the same test and debug for a full assessment of how our code behaves in that case. For now I didn't manage to get any corruption on disk either.
             *Still under assessment.*

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which I believe should not really be used by the final user.

            _Final answer to this point._
             I didn't manage to replicare any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which I believe should not really be used by the final user.

            _Final answer to this point._
             I didn't manage to replicare any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the system. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the system. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +What is the relation with SyncReplication and Transactions?+

            As seen above, the SDK uses SyncDelete for staging docs. In theory a DCP connection (different from the replica connection) that enables SyncReplication and sets IncludeDeletedUserXattrs::No may lead to removing the "txn" UserXattrs in the SyncDelete before it is streamed for replication and persistence. To date, only KV replica connections are supposed to enable SyncReplication, so we are not probably hitting any issue here.

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            *UPDATE: The scenario is not actually possible because at backfill each KVStore::scan is independent from another. That means that different DCP connections are fed with data from different scans. Value blobs are allocated at KVStore::scan and are not shared, so any modification to blobs in a backfill cannot interfere with a backfill running in parallel on a different connection. Essentially, the result of step (5) above will not interfere with step (6).*

             

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            *UPDATE: The scenario is not actually possible because at backfill each KVStore::scan is independent from another. That means that different DCP connections are fed with data from different scans. Value blobs are allocated at KVStore::scan and are not shared, so any modification to blobs in a backfill cannot interfere with a backfill running in parallel on a different connection. Essentially, the result of step (5) above will not interfere with step (6).*

             

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            *UPDATE: The scenario for Committed SyncDelete is not actually possible because at backfill each KVStore::scan is independent from another. That means that different DCP connections are fed with data from different scans. Value blobs are allocated at KVStore::scan and are not shared, so any modification to blobs in a backfill cannot interfere with a backfill running in parallel on a different connection. Essentially, the result of step (5) above will not interfere with step (6).*

             

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            owend Daniel Owen made changes -
            Link This issue relates to MB-41989 [ MB-41989 ]
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

            I have carried on the investigation for assessing another concern.
             A staged doc for TX is stored in the system as a Deletion with the "txn" UserXattrs. TX issues a SyncDelete. KV streams first the Prepare and then the Commit. Is it possible that we lose the "txn" UserXattrs, which is not persisted/replicated at that point?
             Let's consider the two phases of the SyncDelete, Pending and Committed. Both are potentially vulnerable if any connection triggers Item::removeUserXattrs on the payload of those items.

            _Pending SyncDelete_
             For corrupting the payload of a SyncDelete we would need a DCP connection from an external component that enables SyncReplication and sets IncludeDeletedUserXattrs::No. Only replica connections enable SyncReplication, and in 6.6 they always set IncludeDeletedUserXattrs::Yes. I think that we are fine here.

            _Committed SyncDelete_
             A Commit doesn't carry any value. That is streamed as a DCP_COMMIT message, so there is no way we can touch the original SyncDelete payload by processing its Commit.
             There is one exception to that: a Commit in a Disk snapshot is streamed as Deletion. While it doens't seem very likely, the following scenario seems possible:
             # TX issues SyncDelete (txn, level:majority by default)
             # KV replicates the Prepare to the majority of nodes (not persisted anywere)
             # KV commits the Prepare -> Commit persisted on the Active
             # Active crash+restart -> replicas reconnect and trigger a backfill from the Active
             # We have another DCP connection around that triggers Item::removeUserXattrs -> that is executed before we replicate the Committed Deletion to replicas
             # We replicate an empty staged doc (as the "txn" UserXattr has gone at step 5)

            As discussed with [~graham.pople], losing the staged payload may break TX at failover.

            *UPDATE: The scenario for Committed SyncDelete is not actually possible because at backfill each KVStore::scan is independent from another. That means that different DCP connections are fed with data from different scans. Value blobs are allocated at KVStore::scan and are not shared, so any modification to blobs in a backfill cannot interfere with a backfill running in parallel on a different connection. Essentially, the result of step (5) above will not interfere with step (6).*

             

            Note: It is much more likely that we are having issues with CreateAsDeleted in normal Delete rather than SyncDelete. That is because a component doesn't need to enable SyncRepl for triggering Item::removeUserXattrs. But, TX uses normal Delete only if the user runs TX with --durability=none, which is not supposed to be used by the final user.

            _Final answer to this point._
             I didn't manage to reproduce any issue on local tests. Also, the other issues described seem possible but not very likely.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            wayne Wayne Siu made changes -
            Link This issue blocks MB-40528 [ MB-40528 ]
            wayne Wayne Siu made changes -
            Labels approved-for-6.6.1
            paolo.cocchi Paolo Cocchi made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            owend Daniel Owen made changes -
            Assignee Paolo Cocchi [ paolo.cocchi ] Ritam Sharma [ ritam.sharma ]

            Build couchbase-server-6.6.1-9132 contains kv_engine commit 897cd88 with commit message:
            MB-41944: Item::removeUserXattrs() operates on a copy

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9132 contains kv_engine commit 897cd88 with commit message: MB-41944 : Item::removeUserXattrs() operates on a copy
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Assignee Ritam Sharma [ ritam.sharma ] Ashwin Govindarajulu [ ashwin.govindarajulu ]

            Able to reproduce the issue on 6.6.0-7897 and 6.6.1-9131 build using the steps provided above.

            Verified the fix on 6.6.1-9132 and not seeing this crash.

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Able to reproduce the issue on 6.6.0-7897 and 6.6.1-9131 build using the steps provided above. Verified the fix on 6.6.1-9132 and not seeing this crash.
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Attachment trans_failure.pcapng [ 112591 ]
            paolo.cocchi Paolo Cocchi added a comment - - edited

            Graham Pople spotted this in memcached logs:

            2020-10-21T03:20:56.825733-07:00 WARNING (default) EventuallyPersistentEngine::storeIfInner: attempting to store a deleted document with non-zero value size which is 474
            

            I'll to have closer look to that for ensuring that Ashwin Govindarajulu hit a variation of the same (original) issue (ie, not a different bug).

            Ashwin Govindarajulu Could you confirm that you see this latest error only before the fix? Thanks

            paolo.cocchi Paolo Cocchi added a comment - - edited Graham Pople  spotted this in memcached logs: 2020-10-21T03:20:56.825733-07:00 WARNING (default) EventuallyPersistentEngine::storeIfInner: attempting to store a deleted document with non-zero value size which is 474 I'll to have closer look to that for ensuring that Ashwin Govindarajulu hit a variation of the same (original) issue (ie, not a different bug). Ashwin Govindarajulu Could you confirm that you see this latest error only before the fix? Thanks

            Could you confirm that you see this latest error only before the fix?

            Paolo Cocchi I ran the same test on your fix and not seen any transaction failures (I have run 4-5 times till now)

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Could you confirm that you see this latest error only before the fix? Paolo Cocchi I ran the same test on your fix and not seen any transaction failures (I have run 4-5 times till now)
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Assignee Ashwin Govindarajulu [ ashwin.govindarajulu ] Paolo Cocchi [ paolo.cocchi ]
            paolo.cocchi Paolo Cocchi added a comment -

            The TAF test at http://review.couchbase.org/c/TAF/+/138529 always passes on local run, so not able to reproduce the INVALID_ARGUMENTS error.

            paolo.cocchi Paolo Cocchi added a comment - The TAF test at http://review.couchbase.org/c/TAF/+/138529 always passes on local run, so not able to reproduce the INVALID_ARGUMENTS error.
            paolo.cocchi Paolo Cocchi added a comment - - edited

            I've looked at the pcap and the payload in input (that comes from the SDK) is well-formed. Hypothesis here is that the new Ashwin Govindarajulu's test is sporadically hitting a different symptom of the same root issue addressed in this MB.

            paolo.cocchi Paolo Cocchi added a comment - - edited I've looked at the pcap and the payload in input (that comes from the SDK) is well-formed. Hypothesis here is that the new Ashwin Govindarajulu 's test is sporadically hitting a different symptom of the same root issue addressed in this MB.
            graham.pople Graham Pople added a comment -

            Thanks for confirming Paolo Cocchi.  I also couldn't see any issue with the request in the pcap and haven't been able to replicate either.

            graham.pople Graham Pople added a comment - Thanks for confirming Paolo Cocchi .  I also couldn't see any issue with the request in the pcap and haven't been able to replicate either.
            paolo.cocchi Paolo Cocchi made changes -
            Assignee Paolo Cocchi [ paolo.cocchi ] Ashwin Govindarajulu [ ashwin.govindarajulu ]
            paolo.cocchi Paolo Cocchi made changes -
            Assignee Ashwin Govindarajulu [ ashwin.govindarajulu ] Paolo Cocchi [ paolo.cocchi ]
            paolo.cocchi Paolo Cocchi added a comment -

            Hey Ashwin Govindarajulu, opinion in KV is that we can close this as the issue is fixed and verified by QE, plus your latest test is not failing at fix. Assigning back to you, thanks.

            paolo.cocchi Paolo Cocchi added a comment - Hey Ashwin Govindarajulu , opinion in KV is that we can close this as the issue is fixed and verified by QE, plus your latest test is not failing at fix. Assigning back to you, thanks.
            paolo.cocchi Paolo Cocchi made changes -
            Assignee Paolo Cocchi [ paolo.cocchi ] Ashwin Govindarajulu [ ashwin.govindarajulu ]

            Thanks Paolo Cocchi and Graham Pople.

            Closing this ticket since we are not seeing this issue on 6.6.1-9132.

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Thanks Paolo Cocchi and Graham Pople . Closing this ticket since we are not seeing this issue on 6.6.1-9132.
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

            Build couchbase-server-6.6.1-9152 contains kv_engine commit 725c864 with commit message:
            MB-41944: Improve logging for body-size>0 in Item::removeUserXattrs

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9152 contains kv_engine commit 725c864 with commit message: MB-41944 : Improve logging for body-size>0 in Item::removeUserXattrs
            pvarley Patrick Varley made changes -
            Link This issue blocks MB-42352 [ MB-42352 ]
            pvarley Patrick Varley made changes -
            Link This issue blocks MB-42352 [ MB-42352 ]
            paolo.cocchi Paolo Cocchi made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash. See the CBSE-8993 section below for details.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload (which is what happens in CBSE-8993), persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we see a crash in CBSE-8993 ?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here (and in CBSE-8993) is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway from the scenario described for CBSE-8993 and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload, persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we may see a crash?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            paolo.cocchi Paolo Cocchi made changes -
            Link This issue is triggered by CBSE-8993 [ CBSE-8993 ]
            paolo.cocchi Paolo Cocchi made changes -
            Link This issue is triggered by CBSE-8993 [ CBSE-8993 ]

            Build couchbase-server-7.0.0-3683 contains kv_engine commit 897cd88 with commit message:
            MB-41944: Item::removeUserXattrs() operates on a copy

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3683 contains kv_engine commit 897cd88 with commit message: MB-41944 : Item::removeUserXattrs() operates on a copy

            Build couchbase-server-7.0.0-3700 contains kv_engine commit 725c864 with commit message:
            MB-41944: Improve logging for body-size>0 in Item::removeUserXattrs

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3700 contains kv_engine commit 725c864 with commit message: MB-41944 : Improve logging for body-size>0 in Item::removeUserXattrs
            paolo.cocchi Paolo Cocchi made changes -
            Link This issue is triggering MB-42086 [ MB-42086 ]
            anusha.mathur Anusha Mathur made changes -
            Link This issue relates to CBSE-9301 [ CBSE-9301 ]
            anusha.mathur Anusha Mathur made changes -
            Assignee Ashwin Govindarajulu [ ashwin.govindarajulu ] Anusha Mathur [ anusha.mathur ]
            anusha.mathur Anusha Mathur made changes -
            Assignee Anusha Mathur [ anusha.mathur ] Dave Rigby [ drigby ]
            drigby Dave Rigby made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash.

             

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

             

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload, persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we may see a crash?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() added in 6.6.0 may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash.

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload, persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we may see a crash?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            drigby Dave Rigby made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() added in 6.6.0 may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash.

            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.

            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload, persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we may see a crash?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() added in 6.6.0 may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash.


            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.


            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload, persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we may see a crash?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            drigby Dave Rigby made changes -
            Description +*Summary*+

            The code path in Item::removeUserXattrs() added in 6.6.0 may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash.


            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.


            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload, persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we may see a crash?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            +*Summary*+

            The code path in Item::removeUserXattrs() (added in 6.6.0 for MB-37374) may corrupt the item value in memory. That may lead to a number of issues including crashes, DCP disconnection, corrupted data being written to disk in the payload of deletions, corrupted data being streamed via DCP in the payload of deletions.

            The most likely scenario for hitting the issue seems when we have tombstones with SysXattrs (eg, SyncGateway) and UserXattrs in the payload. That usually ends up in a crash.


            +*Details*+

            Before 6.6, a deletion in memcached could never contain UserXattrs in the payload. Only SysXattrs were allowed (eg, Mobile Convergence - the "_sync" SysXattr stored by SyncGateway).
             From 6.6 memcached allows UserXattrs in deletions. That is for supporting "Transparent Transactions" (TXNJ-125). The MCBP now exposes a new CreateAsDeleted API. The SDK uses it for creating "staged" docs for TX as tombstones. The "staged" payload is stored in the "txn" UserXattr in a tombstone.

            A Consumer connection to a 6.6 memcached node must enable the IncludeDeletedUserXattrs feature for getting UserXattrs in deletions over that connection.
             For backward compatibility, and for supporting components that don't understand UserXattrs in deletions, the IncludeDeletedUserXattrs feature is disabled by default on DCP Producers.
             If memcached stores a deletions with UserXattrs in the payload, then 2 things may happen at stream over DCP:
             # IncludeDeletedUserXattrs is enabled. Nothing to do here, we just stream the deletion as it is
             # IncludeDeletedUserXattrs is disabled. Before streaming, we need to check if the payload contains any UserXattrs and we remove it before streaming.

            Item::removeUserXattrs() is when (2) happens.
             For supporting multiple DCP connections with different IncludeDeletedUserXattrs setting, we need to operate on a copy of the payload when making any change to it. Any direct change to the payload would reflect on unrelated DCP connections (and possibly on disk) otherwise, which is what happens here.


            +What do we need for hitting the issue?+

            If we consider only the SDK and memcached during a transaction then we are fine.
             # The SDK stages a doc via a Sync Subdoc(CreateAsDeleted) -> a SyncDelete with a "txn" UserXattr is stored and processed in memcached
             # memcached processes the SyncDelete to completion -> the SyncDelete first and then the committed Delete are streamed over a DCP replica connection that supports IncludeDeletedUserXattrs, so don't even execute Item::removeUserXattrs().

            But if we introduce a component (eg, SyncGateway, GSI, etc..) that streams from memcached via DCP and doesn't enable IncludeDeletedUserXattrs, then we execute Item::removeUserXattrs() on that connection for any deletion that contains UserXAttrs. If that happens, then we modify the shared payload stored in memory, which is the payload pointed by the persistence and replication queue (CheckpointManager).
             At that point, any component that processes the same deletion again from the CheckpointManger will likely hit some issue. Eg, crash at replication when we validate the corrupted payload, persistence of corrupted data (if the flusher persists the deletion after Item::removeUserXattrs() has executed for that deletion), etc..

             

            +Why exactly we may see a crash?+
             # TX completes and creates a committed doc with a given Body
             # SG updates the doc by adding the "_sync" SysXattrs -> doc is alive with Body+SysXattr
             # The user deletes the doc via CMD_DEL -> now we have a tombstone with SysXattr only
             # TX again on the same doc (ie, repeat step (1)) -> TX issues a CreateAsDelete and we end up with a (committed) tombstone with the pre-existing SysXattr + the new "txn" UserXattr
             # We stream the committed tombstone over a DCP connection that triggers Item::removeUserXattrs() -> the result is that we remove the UserXattrs and we update the Xattr-block size in the shared blob for that item
             # A second DCP connection triggers Item::removeUserXattrs() -> At the previous step we have overwritten the size of the Xattr-block in the payload, which is used for computing the size of the Body-chunk -> In Item::removeUserXattrs() now we find that the Body size for the deletion being processed is > 0 -> Invalid, we trigger an exception and crash

             

            +Do we strictly need SyncGateway for hitting a crash?+

            Answer is no. SyncGateway introduces the "_sync" SysXAttr (also in deletions). That together with TX leads memcached to corrupting data in a way that leads to a crash. The point here is on having a SysXAttr + a UserXattr in the payload of a deletion, it's not SyncGateway itself.

             

            +If we remove SysXattrs from the described scenario, can we still hit any issue?+

            I have tried to crash the node by removing SyncGateway and adding just a GSI index that covers my data. I didn't manage to crash the node and I didn't manage to get any corruption in-memory/on-disk either.

             

            +Do we have a workaround?+

            For avoiding the issue in 6.6 we should avoid UserXattrs in deletions.
             Given that TX-1.1 seems the only component that creates tombstones with UserXattrs, using TX-1.0 in place of TX-1.1 on 6.6 cluster is a possibility. Note that here I'm just sharing technical considerations, I am aware that TX-1.1 contains also some fixes (apart from the usage of the newest CreateAsDeleted). The SDK team is best suited for providing an insight on this.
            drigby Dave Rigby made changes -
            Link This issue is caused by MB-37374 [ MB-37374 ]
            drigby Dave Rigby made changes -
            Link This issue causes CBSE-8993 [ CBSE-8993 ]
            drigby Dave Rigby made changes -
            Link This issue is triggered by CBSE-8993 [ CBSE-8993 ]
            drigby Dave Rigby made changes -
            Security Private [ 10010 ]

            People

              drigby Dave Rigby
              paolo.cocchi Paolo Cocchi
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty