Ephemeral purger can delete a StoredValue which is still referenced

Description

One example of this issue occurs with durable writes (which come in via KVBucket::set)

For the issue to occur, it requires a certain hash-table state on entry to the set path, which is something along the lines of both an existing prepare and committed item exist for the key. With the HT in that state the following then triggers the issue.

At this point the problem is that a StoredValue has been marked as stale and at the same-time, htRes still references the stale object.

Problems will now occur if EphTombstoneStaleItemDeleter runs during the lifetime of htRes.

For example

  • Whilst the set of k1 is still executing, i.e. htRes has two non-owning pointers, the EphTombstoneStaleItemDeleter wakes up and runs.

  • The EphTombstoneStaleItemDeleter walks the ephemeral linked-list and looks for objects that are marked stale, if stale -> delete.

  • Now when htRes destructs it will use the deleted object causing a number of issues.

Note this bug is quickly evident if we put something like the following code in after the call to processSet

// note isStale1 is a temp function that exposes the stale bit to StoredValue if (htRes.pending && htRes.pending->isStale1()) { std::stringstream ss; ss << *htRes.pending; LOG_CRITICAL("StoredValue after processSet is stale {}", ss.str()); }

Running a 2 node cluster and a 100% durable write workload and this message is printed often.

Note this bug has been seen to cause a crash of memcached because the following exception gets thrown:

CRITICAL Caught unhandled std::exception-derived exception. what(): CollectionID: invalid value:2

This is because the htRes destruct path tried to get the prefix of the deleted stored-value and found that the value is no longe valid.

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None
100% Done
Loading...

Activity

Show:

CB robot March 22, 2023 at 10:14 AM

Build couchbase-server-8.0.0-1273 contains kv_engine commit 62515e2 with commit message:
https://couchbasecloud.atlassian.net/browse/MB-53922#icft=MB-53922, https://couchbasecloud.atlassian.net/browse/MB-54295#icft=MB-54295: Remove ~StoredValueProxy

CB robot March 21, 2023 at 5:24 PM

Build couchbase-server-7.5.0-3957 contains kv_engine commit 62515e2 with commit message:
https://couchbasecloud.atlassian.net/browse/MB-53922#icft=MB-53922, https://couchbasecloud.atlassian.net/browse/MB-54295#icft=MB-54295: Remove ~StoredValueProxy

Dave Rigby March 15, 2023 at 4:13 PM

Can you please add a release note for this issue (https://hub.internal.couchbase.com/confluence/display/PM/Release+Notes+-+How+to) ?

CB robot February 3, 2023 at 5:20 PM

Build couchbase-server-7.2.0-5134 contains kv_engine commit 62515e2 with commit message:
https://couchbasecloud.atlassian.net/browse/MB-53922#icft=MB-53922, https://couchbasecloud.atlassian.net/browse/MB-54295#icft=MB-54295: Remove ~StoredValueProxy

CB robot February 1, 2023 at 5:49 PM

Build couchbase-server-7.1.4-3577 contains kv_engine commit a6922c0 with commit message:
https://couchbasecloud.atlassian.net/browse/MB-53922#icft=MB-53922: Remove ~StoredValueProxy

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Unknown

Triage

Untriaged

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created September 30, 2022 at 1:16 PM
Updated October 11, 2024 at 7:45 AM
Resolved October 27, 2022 at 9:18 AM
Instabug