6.5.1, 6.6.0, 6.6.1, 6.6.2, 6.5.2, 6.5.0, 6.6.3, 6.6.4, 6.6.5, 7.0.0-Beta1, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.0, 7.1.1
One example of this issue occurs with durable writes (which come in via KVBucket::set)
For the issue to occur, it requires a certain hash-table state on entry to the set path, which is something along the lines of both an existing prepare and committed item exist for the key. With the HT in that state the following then triggers the issue.
- A new prepare occurs for k1 which executes on the normal set path, VBucket::set will look up the existing key.
- After this call the returned htRes object has non-owning pointers (StoredValue*) to two both 'version' of k1, the pending and committed StoredValue.
- Next execution proceeds to processSet
- Inside the path of processSet it is determined that one version of k1 is now stale - so far unsure of the exact steps/path but, KV certainly reaches something like the following:
- updateStoredValue https://src.couchbase.org/source/xref/6.6.3/kv_engine/engines/ep/src/ephemeral_vb.cc?r=40b0f7c2#383
- Which eventually calls markItemStale https://src.couchbase.org/source/xref/6.6.3/kv_engine/engines/ep/src/ephemeral_vb.cc?r=40b0f7c2#460
At this point the problem is that a StoredValue has been marked as stale and at the same-time, htRes still references the stale object.
Problems will now occur if EphTombstoneStaleItemDeleter runs during the lifetime of htRes.
- Whilst the set of k1 is still executing, i.e. htRes has two non-owning pointers, the EphTombstoneStaleItemDeleter wakes up and runs.
- The EphTombstoneStaleItemDeleter walks the ephemeral linked-list and looks for objects that are marked stale, if stale -> delete.
- Now when htRes destructs it will use the deleted object causing a number of issues.
Note this bug is quickly evident if we put something like the following code in after the call to processSet
Running a 2 node cluster and a 100% durable write workload and this message is printed often.
Note this bug has been seen to cause a crash of memcached because the following exception gets thrown:
This is because the htRes destruct path tried to get the prefix of the deleted stored-value and found that the value is no longe valid.
|For Gerrit Dashboard: MB-53922|
|180825,6||MB-53922: Remove ~StoredValueProxy||mad-hatter||kv_engine||Status: MERGED||+2||+1|
|181514,1||Merge branch 'couchbase/mad-hatter' into 'couchbase/cheshire-cat'||cheshire-cat||kv_engine||Status: MERGED||+2||+1|
|182377,1||MB-53922: Merge commit 'a6922c068' into 'couchbase/neo'||neo||kv_engine||Status: ABANDONED||0||0|
|182452,1||MB-54295: Merge commit 'couchbase/cheshire-cat' into neo||neo||kv_engine||Status: MERGED||+2||-1|
|182505,4||MB-53922, MB-54295: Remove ~StoredValueProxy||7.1.3||kv_engine||Status: MERGED||+2||+1|
|182822,3||Merge commit 'a6922c068' into 'couchbase/master'||master||kv_engine||Status: ABANDONED||-1||-1|
|183063,1||Merge commit 'a6922c068' into 'couchbase/master'||master||kv_engine||Status: ABANDONED||-1||-1|
|183067,3||Merge commit 'c253ed69a' into 'couchbase/master'||master||kv_engine||Status: MERGED||+2||+1|
|185457,5||Merge branch cheshire-cat into 7.1.4||7.1.4||kv_engine||Status: MERGED||+2||+1|
|185839,2||MB-53829: Merge commit 'ed5fe2e' into 'couchbase/neo'||neo||kv_engine||Status: ABANDONED||0||-1|
|188532,1||Merge neo/e245726d3 into master||master||kv_engine||Status: MERGED||+2||+1|