Details
-
Bug
-
Resolution: Fixed
-
Major
-
.master, 3.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 4.0.0
-
Security Level: Public
-
Untriaged
-
Unknown
-
KV: Oct 4 - Oct 24
Description
When running under ThreadSanitizer it reports a data race between reading VBucket::purgeSeqno from EPStore::persistVBState() and writing it from EPStore::compactVBucket() - see full report below.
persistVBState is performing a dirty read in purgeSeqno, which I believe could result in an inconsistant vbucket_state object written to disk. Specifically, getState, getMaxCas and getDriftCounter look all to be read dirtily, and may be inconsistent compared to snapshot_range.
Extract of code in question (http://src.couchbase.org/source/xref/trunk/ep-engine/src/ep.cc#1199):
bool EventuallyPersistentStore::persistVBState(const Priority &priority,
|
uint16_t vbid) {
|
...
|
|
snapshot_range_t range;
|
vb->getPersistedSnapshot(range);
|
vbucket_state vb_state(vb->getState(), chkId, 0, vb->getHighSeqno(),
|
vb->getPurgeSeqno(), range.start, range.end,
|
vb->getMaxCas(), vb->getDriftCounter(),
|
failovers);
|
|
bool inverse = false;
|
LockHolder lh(vb_mutexes[vbid], true /*tryLock*/);
|
...
|
if (rwUnderlying->snapshotVBucket(vbid, vb_state, &kvcb)) {
|
...
|
Note we construct vb_state before we acquire the lock on that vBucket.
I believe this is potentially a data corruption issue as we could write an inconsistent vBucket state to disk; which if we crashed and then read from disk on restart could be incorrect.
ThreadSanitizer output:
WARNING: ThreadSanitizer: data race (pid=29921)
|
Write of size 8 at 0x7d680001f580 by thread T5 (mutexes: write M12734):
|
#0 VBucket::setPurgeSeqno() ep-engine/src/vbucket.h:215:9 (ep.so+0x000000086204)
|
#1 EventuallyPersistentStore::compactVBucket() ep-engine/src/ep.cc:1584 (ep.so+0x000000086204)
|
#2 CompactVBucketTask::run() ep-engine/src/tasks.cc:94:12 (ep.so+0x00000012971e)
|
#3 ExecutorThread::run() ep-engine/src/executorthread.cc:115:26 (ep.so+0x0000000ea41d)
|
#4 launch_executor_thread() ep-engine/src/executorthread.cc:33:9 (ep.so+0x0000000e9fe5)
|
#5 platform_thread_wrap platform/src/cb_pthreads.c:23:5 (libplatform.so.0.1.0+0x000000004161)
|
|
Previous read of size 8 at 0x7d680001f580 by thread T7:
|
#0 VBucket::getPurgeSeqno() ep-engine/src/vbucket.h:211:16 (ep.so+0x0000000821d3)
|
#1 EventuallyPersistentStore::persistVBState() ep-engine/src/ep.cc:1217 (ep.so+0x0000000821d3)
|
#2 VBStatePersistTask::run() ep-engine/src/tasks.cc:86:12 (ep.so+0x000000129636)
|
#3 ExecutorThread::run() ep-engine/src/executorthread.cc:115:26 (ep.so+0x0000000ea41d)
|
#4 launch_executor_thread() ep-engine/src/executorthread.cc:33:9 (ep.so+0x0000000e9fe5)
|
#5 platform_thread_wrap platform/src/cb_pthreads.c:23:5 (libplatform.so.0.1.0+0x000000004161)
|
|
Location is heap block of size 1392 at 0x7d680001f200 allocated by main thread:
|
#0 operator new() <null> (engine_testapp+0x00000045cded)
|
#1 EventuallyPersistentStore::setVBucketState() ep-engine/src/ep.cc:1300:30 (ep.so+0x000000082b1a)
|
#2 EventuallyPersistentEngine::setVBucketState() ep-engine/src/ep_engine.h:718:16 (ep.so+0x0000000ca308)
|
#3 setVBucket()) ep-engine/src/ep_engine.cc:884 (ep.so+0x0000000ca308)
|
#4 processUnknownCommand()) ep-engine/src/ep_engine.cc:1178 (ep.so+0x0000000ca308)
|
#5 EvpUnknownCommand()) ep-engine/src/ep_engine.cc:1389:38 (ep.so+0x0000000aafc8)
|
#6 mock_unknown_command()) memcached/programs/engine_testapp/engine_testapp.cc:380:19 (engine_testapp+0x0000004c56b9)
|
#7 set_vbucket_state() ep-engine/tests/ep_test_apis.cc:607:9 (ep_testsuite.so+0x0000000a3a4b)
|
#8 test_setup() ep-engine/tests/ep_testsuite_common.cc:146:28 (ep_testsuite.so+0x00000009cdda)
|
#9 execute_test() memcached/programs/engine_testapp/engine_testapp.cc:1085:47 (engine_testapp+0x0000004c4103)
|
#10 main memcached/programs/engine_testapp/engine_testapp.cc:1439 (engine_testapp+0x0000004c4103)
|
Steps to reproduce
1. Build with ThreadSanitizer - see tlm/README.md for details, something like CC=clang-3.6 CXX=clang++-3.6 make EXTRA_CMAKE_OPTIONS="-D CB_THREADSANITIZER=1" -j8
2. Run ep_testsuite 341: TSAN_OPTIONS="external_symbolizer_path=/usr/bin/llvm-symbolizer-3.6 suppressions=/home/couchbase/couchbase/tlm/tsan.suppressions second_deadlock_stack=1" "/home/couchbase/couchbase/build/memcached/engine_testapp" "-E" "ep.so" "-T" "ep_testsuite.so" "-v" "-e" "flushall_enabled=truel;ht_size=13;ht_locks=7" -C 341
Attachments
For Gerrit Dashboard: MB-16496 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
55972,2 | MB-16496 Fix the race on vbucket state between persistVBState() and compactVB() | master | ep-engine | Status: MERGED | +2 | +1 |
56068,5 | MB-16500 [BP]: MB-16496 Fix the race on vbucket state between persistVBState() and compactVB() | 3.0.x | ep-engine | Status: MERGED | +2 | +1 |
56124,1 | Merge remote-tracking branch 'couchbase/3.0.x' into sherlock | sherlock | ep-engine | Status: MERGED | +2 | +1 |
56127,1 | Merge remote-tracking branch 'couchbase/sherlock' | master | ep-engine | Status: MERGED | +2 | +1 |