Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.3, 6.6.4, 6.6.5, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4
-
Untriaged
-
1
-
Unknown
Description
TL;DR: If, in a value eviction bucket, subdoc is used to remove a deleted item's xattrs (e.g., by SyncGateway), a replica may persist an item with xattr datatype still set, but an empty body. This can cause memcached crashes later down the line when that item is read from disk e.g., for a DCP backfill.
So far, seems most likely to affect clusters with SyncGateway and one or more replicas. Actors other than SG may be able to cause this, and with tight timing this could also be triggered during a vbucket move as part of rebalance with no replicas.
Setup
This issue becomes possible if the HashTable on a replica holds a deleted item with xattrs, with a non-resident body.
There are a handful of routes to this point. The easiest is:
- VBucket starts as active.
- Has an item with system xattrs (e.g., _sync) deleted. This removes it from the HT too.
- Has that item bgfetched back into the HT (e.g., to serve a request to read the _sync xattr).
- Has the value for the delete evicted through the standard value eviction under memory pressure. This delete's metadata can now persist in the HT indefinitely.
- Later transitions to replica. At this stage the HashTable looks like:
HashTable[0x119276020] with numItems:1 numInMemory:1 numDeleted:1 numNonResident:0 numTemp:0 numSystemItems:0 numPreparedSW:0 values:
SV @0x11857ea80 X.. .D..Cm temp: seq:1 rev:1 cas:1234 key:"cid:0x0:key, size:4" exp:1657279495 age:0 nru:2 fc:0 vallen:0
^ ^^ ^
| || |
Datatype::XATTR || In-memory value length is zero as not resident
Deleted|
|
Not Resident (bit clear)
As noted in MB-50423, the deleted item's metadata can remain in the HashTable for an arbitrarily long time - until overwritten, memcached restarts, or the vbucket is deleted.
The vbucket is currently "correct", but now primed for the actual issue to occur...
Issue
5. Subdoc is used to remove the _sync system xattr from the discussed document. If that was the only xattr, the item's datatype should now transition to RAW_BYTES, as it has no xattrs at all.
The active vbucket will manage this correctly, persisting a deleted item, RAW_BYTES, no value.
However, when this change is replicated over DCP, deleteWithMeta will be used, and will encounter the primed deleted item with a non-resident value. The deleteWithMeta code path will skip several checks which are only relevant to active vbs, and will attempt to delete the stored value, reaching:
bool StoredValue::deleteImpl(DeleteSource delSource) {
|
if (isDeleted() && !getValue()) {
|
// SV is already marked as deleted and has no value - no further
|
// deletion possible.
|
return false;
|
}
|
|
resetValue();
|
setDatatype(PROTOCOL_BINARY_RAW_BYTES);
|
setPendingSeqno();
|
|
setDeletedPriv(true);
|
setDeletionSource(delSource);
|
markDirty();
|
|
return true;
|
}
|
Unfortunately, the StoredValue is both deleted, and does not have a value in memory. This leads to an early exit, and skips updating several attributes, including the datatype.
Now the deleteWithMeta will, in EPVBucket::softDeleteStoredValue call queueDirty. This will queue the updated item state for persistence, taken from the SV. Now the replica will persist a deleted item with datatype xattrs and no value. This is invalid, as items with datatype xattr are expected to have actual xattrs in the value - various codepaths trust this assumption.
For example, the state of the HashTable after the second delete (removing the System XATTR would be:
HashTable[0x116a0f020] with numItems:1 numInMemory:1 numDeleted:1 numNonResident:0 numTemp:0 numSystemItems:0 numPreparedSW:0 values:
|
SV @0x115d16a80 X.. WD..Cm temp: seq:2 rev:1 cas:5678 key:"cid:0x0:key, size:4" exp:0 age:0 nru:2 fc:0 vallen:0
|
^ ^^^ ^
|
/ ||| In-memory value length is zero
|
Datatype::XATTR |||
|
/||
|
Written (dirty) ||
|
/|
|
Deleted |
|
/
|
Not Resident (bit clear)
|
The issue here is we have a dirty (written) item - which by definition must have it's value present, except here it has a zero length value, but the datatype is not RAW_BYTES, it is still XATTR.
Fallout
The bad state on disk can go unnoticed for some time. If nothing else occurs within the purge interval, the deleted item may be cleaned up without ever causing visible symptoms. However, if the vbucket becomes active, any subsequent use of that item is quite likely to cause issues.
E.g.,, streaming all items from disk over DCP may crash memcached with:
memcached<0.17628.0>: 2022-06-16T10:43:19.365642+02:00 CRITICAL Caught unhandled std::exception-derived exception. what(): GSL: Precondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/xattr/utils.cc: 133
|
memcached<0.17628.0>: terminate called after throwing an instance of 'gsl::fail_fast'
|
memcached<0.17628.0>: what(): GSL: Precondition failure at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/xattr/utils.cc: 133
|
utils.cc |
|
132
|
uint32_t get_body_offset(const cb::const_char_buffer& payload) {
|
133
|
Expects(payload.size() > 0);
|
134
|
const uint32_t* lenptr = reinterpret_cast<const uint32_t*>(payload.buf);
|
135
|
auto len = ntohl(*lenptr);
|
136
|
check_len(len, payload.size());
|
137
|
return len + sizeof(uint32_t);
|
138
|
}
|
Summary
What seems to be quite routine SyncGateway behaviour (removing _sync xattr) can lead to bad state on disk on a replica. This may be occurring quite frequently, only becoming a visible issue if that vbucket becomes active or streams items over DCP while remaining a replica - only views streams from replicas as far as I'm aware.
Attachments
Issue Links
For Gerrit Dashboard: MB-52793 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
176868,1 | MB-52793: Ensure StoredValue::del updates datatype | neo | kv_engine | Status: ABANDONED | -1 | -1 |
177197,4 | MB-52793: Ensure StoredValue::del updates datatype | mad-hatter | kv_engine | Status: MERGED | +2 | +1 |
177217,4 | MB-51373: Inspect and correct Item objects created by KVStore | mad-hatter | kv_engine | Status: MERGED | +2 | +1 |
177290,7 | Adding functional test for MB-52793 | neo | TAF | Status: MERGED | +2 | +1 |
177411,1 | Adding functional test for MB-52793 | master | TAF | Status: ABANDONED | 0 | +1 |
177548,5 | MB-51373: Inspect and correct Item objects created by KVStore | neo | kv_engine | Status: MERGED | +2 | +1 |
178202,2 | MB-52793: Merge branch 'mad-hatter' into cheshire-cat | cheshire-cat | kv_engine | Status: MERGED | +2 | +1 |
178483,3 | Merge branch 'cheshire-cat' into neo | neo | kv_engine | Status: MERGED | +2 | +1 |
178895,1 | Merge commit 'couchbase/neo~7' into trunk | master | kv_engine | Status: MERGED | +2 | +1 |