Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.5.1, 6.6.0, 6.6.1, 6.6.2, 6.5.2, 6.5.0, 6.6.3, 6.6.4, 6.6.5, 7.0.0-Beta1, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.0, 7.1.1
-
Untriaged
-
1
-
Unknown
Description
One example of this issue occurs with durable writes (which come in via KVBucket::set)
For the issue to occur, it requires a certain hash-table state on entry to the set path, which is something along the lines of both an existing prepare and committed item exist for the key. With the HT in that state the following then triggers the issue.
- A new prepare occurs for k1 which executes on the normal set path, VBucket::set will look up the existing key.
- After this call the returned htRes object has non-owning pointers (StoredValue*) to two both 'version' of k1, the pending and committed StoredValue.
- Next execution proceeds to processSet
- Inside the path of processSet it is determined that one version of k1 is now stale - so far unsure of the exact steps/path but, KV certainly reaches something like the following:
- updateStoredValue https://src.couchbase.org/source/xref/6.6.3/kv_engine/engines/ep/src/ephemeral_vb.cc?r=40b0f7c2#383
- Which eventually calls markItemStale https://src.couchbase.org/source/xref/6.6.3/kv_engine/engines/ep/src/ephemeral_vb.cc?r=40b0f7c2#460
At this point the problem is that a StoredValue has been marked as stale and at the same-time, htRes still references the stale object.
Problems will now occur if EphTombstoneStaleItemDeleter runs during the lifetime of htRes.
For example
- Whilst the set of k1 is still executing, i.e. htRes has two non-owning pointers, the EphTombstoneStaleItemDeleter wakes up and runs.
- The EphTombstoneStaleItemDeleter walks the ephemeral linked-list and looks for objects that are marked stale, if stale -> delete.
- Now when htRes destructs it will use the deleted object causing a number of issues.
Note this bug is quickly evident if we put something like the following code in after the call to processSet
// note isStale1 is a temp function that exposes the stale bit to StoredValue
|
if (htRes.pending && htRes.pending->isStale1()) {
|
std::stringstream ss;
|
ss << *htRes.pending;
|
LOG_CRITICAL("StoredValue after processSet is stale {}", ss.str());
|
}
|
Running a 2 node cluster and a 100% durable write workload and this message is printed often.
Note this bug has been seen to cause a crash of memcached because the following exception gets thrown:
CRITICAL Caught unhandled std::exception-derived exception. what(): CollectionID: invalid value:2
|
This is because the htRes destruct path tried to get the prefix of the deleted stored-value and found that the value is no longe valid.