Details
-
Bug
-
Resolution: Fixed
-
Major
-
.master
-
None
-
Untriaged
-
Unknown
Description
Got a core dump in commit validation today due to a race where the file wasn't created on disk when a TAP backfill tried to read it:
11:03:02 Running [0085/0228]: tap filter stream (couchstore)...(0 sec) CORE DUMPED
11:03:02 terminate called after throwing an instance of 'std::invalid_argument'
11:03:02 what(): CouchKVStore::getDbFileInfo: Failed to open database file for vBucket = 1 rev = 1 with error:no such file
Having investigated it a little with DaveR, he suggested that BackfillDiskLoad::run() should be amended so that a call to `DBFileInfo info = store->getDbFileInfo(vbucket);` should be setup to catch an exception and to sleep and retry if one occurred.
CV Job:
http://cv.jenkins.couchbase.com/job/ep-engine-gerrit-master/188/
Core-dump attached incase it gets archived, instructions to run:
1
|
gdb home/couchbase/cvjenkins/workspace/ep-engine-gerrit-master/label/ubuntu-1204/build/memcached/engine_testapp --core core.mc:auxio_15.8813 -ex 'set debug-file-directory usr/lib/debug' -ex 'set sysroot .' |
Backtrace from commit validation log:
1
|
11:04:20 Backtrace of crashing thread:
|
2
|
11:04:20 [New LWP 8831]
|
3
|
11:04:20 [New LWP 8830]
|
4
|
11:04:20 [New LWP 8832]
|
5
|
11:04:20 [New LWP 8824]
|
6
|
11:04:20 [New LWP 8813]
|
7
|
11:04:20 [New LWP 8819]
|
8
|
11:04:20 [New LWP 8823]
|
9
|
11:04:20 [New LWP 8825]
|
10
|
11:04:20 [New LWP 8820]
|
11
|
11:04:20 [New LWP 8833]
|
12
|
11:04:20 [New LWP 8821]
|
13
|
11:04:20 [New LWP 8815]
|
14
|
11:04:20 [New LWP 8827]
|
15
|
11:04:20 [New LWP 8829]
|
16
|
11:04:20 [New LWP 8822]
|
17
|
11:04:20 [New LWP 8818]
|
18
|
11:04:20 [New LWP 8817]
|
19
|
11:04:20 [New LWP 8816]
|
20
|
11:04:20 [New LWP 8826]
|
21
|
11:04:20 [New LWP 8828]
|
22
|
11:04:20 [Thread debugging using libthread_db enabled]
|
23
|
11:04:20 Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". |
24
|
11:04:21 Core was generated by `/home/couchbase/cvjenkins/workspace/ep-engine-gerrit-master/label/ubuntu-1204/b'. |
25
|
11:04:21 Program terminated with signal 6, Aborted.
|
26
|
11:04:21 #0 0x00002ad44e7cc0d5 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 |
27
|
11:04:21 #0 0x00002ad44e7cc0d5 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 |
28
|
11:04:21 #1 0x00002ad44e7cf83b in __GI_abort () at abort.c:91 |
29
|
11:04:21 #2 0x00002ad44e2dac9d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 |
30
|
11:04:21 #3 0x00002ad44e2d8ce6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 |
31
|
11:04:21 #4 0x00002ad44e2d8d31 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 |
32
|
11:04:21 #5 0x00002ad44e2d8f48 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 |
33
|
11:04:22 #6 0x00002ad44ffa92cb in CouchKVStore::getDbFileInfo(unsigned short) () at /home/couchbase/cvjenkins/workspace/ep-engine-gerrit-master/label/ubuntu-1204/ep-engine/src/couch-kvstore/couch-kvstore.cc:2253 |
34
|
11:04:22 #7 0x00002ad44fecae4f in BackfillDiskLoad::run() () at /home/couchbase/cvjenkins/workspace/ep-engine-gerrit-master/label/ubuntu-1204/ep-engine/src/backfill.cc:117 |
35
|
11:04:22 #8 0x00002ad44ff4ec30 in ExecutorThread::run() () at /home/couchbase/cvjenkins/workspace/ep-engine-gerrit-master/label/ubuntu-1204/ep-engine/src/executorthread.cc:115 |
36
|
11:04:22 #9 0x00002ad44dbf172a in platform_thread_wrap(void*) () at /home/couchbase/cvjenkins/workspace/ep-engine-gerrit-master/label/ubuntu-1204/platform/src/cb_pthreads.cc:54 |
37
|
11:04:22 #10 0x00002ad44e05be9a in start_thread (arg=0x2ad455a7c700) at pthread_create.c:308 |
38
|
11:04:22 #11 0x00002ad44e88938d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 |
39
|
11:04:22 #12 0x0000000000000000 in ?? () |