Details
Description
All trace events is stored within a std::vector and we let the vector reallocate itself as part of inserting trace events. By doing so we don't know when the system performs a reallocation
The thread which owns the cookie allocates the backing vector when it inserts the first tracecode. There is more space when the engine adds the next Set trace code, and if the systestem needs to do a background fetch it'll insert the first Background Wait event. It then tells IO subsystem to load the item and returns EWB and goes off to serve new clients.
Then one of the engines daemon threads starts completes the background fetch and ends up in http://src.couchbase.org/source/xref/trunk/kv_engine/engines/ep/src/ep_vb.cc#205-214 :
// Close the BackgroundWait span; and add a BackgroundLoad span
|
auto* traceable = cookie2traceable(fetched_item.cookie);
|
if (traceable && traceable->isTracingEnabled()) {
|
NonBucketAllocationGuard guard;
|
auto& tracer = traceable->getTracer();
|
tracer.end(fetched_item.traceSpanId, startTime);
|
auto spanId =
|
tracer.begin(cb::tracing::Code::BackgroundLoad, startTime);
|
tracer.end(spanId, fetchEnd);
|
}
|
If we're really unlucky there isn't space for the BackgroundLoad entry, so we'll get a reallocation there. (which would cause a "free" on the old memory address). Note that there are no locks here, so imagine that the "frontend" thread starts working on the cookie it may use the old pointer (potentially delete it again; write stuff into bogus locations etc).
Attachments
Issue Links
- backports to
-
MB-40058 [BP] Race conditions for tracing backing store
- Closed
- is duplicated by
-
MB-39881 ASAN Identifying issue with Tracing functionality
- Resolved
-
MB-39327 [Jepsen] Crash in set-kill couchbase bucket test
- Closed
-
MB-39408 Volume Test: Core dump is seen while reading 10M deleted documents immediately after all 10M is deleted.
- Closed
-
MB-39337 crash during memcached kills in magma crash recovery test.
- Closed
-
MB-39417 memcached crashed and recovered during data load.
- Closed
-
MB-39668 Memcached coredumps found data load+ collection maxttl + durability + graceful failover + full recovery
- Closed