Fixed
Pinned fields
Click on the next to a field label to start pinning.
Details
Assignee
Jim WalkerJim WalkerReporter
Jim WalkerJim WalkerIs this a Regression?
UnknownTriage
UntriagedStory Points
1Priority
CriticalInstabug
Open Instabug
Details
Details
Assignee
Jim Walker
Jim WalkerReporter
Jim Walker
Jim WalkerIs this a Regression?
Unknown
Triage
Untriaged
Story Points
1
Priority
Instabug
Open Instabug
PagerDuty
PagerDuty
PagerDuty
Sentry
Sentry
Sentry
Zendesk Support
Zendesk Support
Zendesk Support
Created September 30, 2022 at 1:16 PM
Updated October 11, 2024 at 7:45 AM
Resolved October 27, 2022 at 9:18 AM
One example of this issue occurs with durable writes (which come in via
KVBucket::set
)For the issue to occur, it requires a certain hash-table state on entry to the set path, which is something along the lines of both an existing prepare and committed item exist for the key. With the HT in that state the following then triggers the issue.
A new prepare occurs for
k1
which executes on the normal set path,VBucket::set
will look up the existing key.https://src.couchbase.org/source/xref/6.6.3/kv_engine/engines/ep/src/vbucket.cc?r=028f229b#1548
After this call the returned
htRes
object has non-owning pointers (StoredValue*
) to two both 'version' ofk1
, the pending and committedStoredValue
.Next execution proceeds to
processSet
Inside the path of
processSet
it is determined that one version ofk1
is now stale - so far unsure of the exact steps/path but, KV certainly reaches something like the following:updateStoredValue
https://src.couchbase.org/source/xref/6.6.3/kv_engine/engines/ep/src/ephemeral_vb.cc?r=40b0f7c2#383Which eventually calls
markItemStale
https://src.couchbase.org/source/xref/6.6.3/kv_engine/engines/ep/src/ephemeral_vb.cc?r=40b0f7c2#460
At this point the problem is that a
StoredValue
has been marked as stale and at the same-time,htRes
still references the stale object.Problems will now occur if
EphTombstoneStaleItemDeleter
runs during the lifetime of htRes.For example
Whilst the set of
k1
is still executing, i.e.htRes
has two non-owning pointers, theEphTombstoneStaleItemDeleter
wakes up and runs.The
EphTombstoneStaleItemDeleter
walks the ephemeral linked-list and looks for objects that are marked stale, if stale -> delete.Now when
htRes
destructs it will use the deleted object causing a number of issues.Note this bug is quickly evident if we put something like the following code in after the call to
processSet
// note isStale1 is a temp function that exposes the stale bit to StoredValue if (htRes.pending && htRes.pending->isStale1()) { std::stringstream ss; ss << *htRes.pending; LOG_CRITICAL("StoredValue after processSet is stale {}", ss.str()); }
Running a 2 node cluster and a 100% durable write workload and this message is printed often.
Note this bug has been seen to cause a crash of memcached because the following exception gets thrown:
CRITICAL Caught unhandled std::exception-derived exception. what(): CollectionID: invalid value:2
This is because the
htRes
destruct path tried to get the prefix of the deleted stored-value and found that the value is no longe valid.