Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.4, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3
-
7.2.0-5318
-
Triaged
-
0
-
Unknown
-
KV 2023-2, KV 2023-4
Description
Steps To Recreate:
- Create a 4 node cluster
- Create a magma bucket with (bucket_history_retention_seconds=600,bucket_history_retention_bytes=6000000000)
- Create 5000000 items(doc size = 256)
- Start new doc ops(update:expiry)
- Trigger compaction
- SIGKILL memcached once
- Observed Memcached crashed in CheckpointManager::expelUnreferencedCheckpointItems (this=0x7f6bcc52de40)
Note:
Though actual test is about crash recovery .Basically keep killing memcached while data loading is going on and between two sigkill test waits for cluster warmup to finish and after warmup finishes test waits for 30 to 60 before next iteration of memcached kill, so total time between two sigkills is = warmup_time+30/60 seconds) , but in the case the crash was observed after first kill itself(since crash was observed memcached was killed just once)
Core Dump was found on node 172.23.121.115
BackTrace:
(gdb) bt full
|
#0 0x00007f6befeac8eb in raise () from /lib/x86_64-linux-gnu/libc.so.6
|
No symbol table info available.
|
#1 0x00007f6befe97535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
|
No symbol table info available.
|
#2 0x00007f6bf046b63c in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/vterminate.cc:95
|
terminating = false
|
t = <optimized out>
|
#3 0x0000000000b4d71b in backtrace_terminate_handler ()
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/utilities/terminate_handler.cc:88
|
No locals.
|
#4 0x00007f6bf04768f6 in __cxxabiv1::__terminate (handler=<optimized out>)
|
at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:48
|
No locals.
|
#5 0x00007f6bf0476961 in std::terminate () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:58
|
No locals.
|
#6 0x00007f6bf0476bf4 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7f6b980033b0, tinfo=tinfo@entry=0xc5fdc8 <typeinfo for gsl::fail_fast>,
|
dest=dest@entry=0x59b9e0 <gsl::fail_fast::~fail_fast()>) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:95
|
globals = <optimized out>
|
header = 0x7f6b98003330
|
#7 0x00000000004506c3 in gsl::detail::fail_fast_throw (
|
message=0xc8e3a8 "GSL: Precondition failure: 'extractRes.getExpelCursor().getCheckpoint()->get() == checkpoint' at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/checkpoint_manager.cc:"...)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/third_party/gsl-lite/include/gsl/gsl-lite.hpp:1769
|
No locals.
|
#8 0x00000000004c2498 in CheckpointManager::expelUnreferencedCheckpointItems (this=0x7f6bcc52de40)
|
at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
|
lh = {_M_device = @0x7f6bcc52ded0}
|
checkpoint = <optimized out>
|
overheadCheck = <optimized out>
|
extractRes = {
|
items = {<boost::container::dtl::node_alloc_holder<MemoryTrackingAllocator<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >, cb::NonNegativeCounter<unsigned long, cb::ClampAtZeroUnderflowPolicy> >, boost::intrusive::list_impl<boost::intrusive::bhtraits<boost::container::dtl::list_node<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >, void*>, boost::intrusive::list_node_traits<void*>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 1>, unsigned long, true, void> >> = {<MemoryTrackingAllocator<boost::container::dtl::list_node<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >, void*>, cb::NonNegativeCounter<unsigned long, cb::ClampAtZeroUnderflowPolicy> >> = {
|
baseAllocator = {<__gnu_cxx::new_allocator<boost::container::dtl::list_node<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >, void*> >> = {<No data fields>}, <No data fields>},
|
bytesAllocated = {<std::__shared_ptr<cb::NonNegativeCounter<unsigned long, cb::ClampAtZeroUnderflowPolicy>, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<cb::NonNegativeCounter<unsigned long, cb::ClampAtZeroUnderflowPolicy>, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7f6af5f5d550, _M_refcount = {_M_pi = 0x7f6af5f5d540}}, <No data fields>}}, m_icont = {static constant_time_size = true,
|
--Type <RET> for more, q to quit, c to continue without paging--
|
static stateful_value_traits = <optimized out>, static has_container_from_iterator = <optimized out>,
|
static safemode_or_autounlink = false,
|
data_ = {<boost::intrusive::bhtraits<boost::container::dtl::list_node<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >, void*>, boost::intrusive::list_node_traits<void*>, (boost::intrusive::link_mode_type)0, boost::intrusive::dft_tag, 1>> = {<boost::intrusive::bhtraits_base<boost::container::dtl::list_node<SingleThreadedRCPtr<Item, Item*, std::default_delete<Item> >, void*>, boost::intrusive::list_node<void*>*, boost::intrusive::dft_tag, 1>> = {<No data fields>}, static link_mode = boost::intrusive::normal_link},
|
root_plus_size_ = {<boost::intrusive::detail::size_holder<true, unsigned long, void>> = {
|
static constant_time_size = <optimized out>, size_ = 0}, m_header = {<boost::intrusive::list_node<void*>> = {
|
next_ = 0x7f6bce7ea140, prev_ = 0x7f6bce7ea140}, <No data fields>}}}}}, <No data fields>}, manager = 0x7f6bcc52de40,
|
expelCursor = {<std::__shared_ptr<CheckpointCursor, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<CheckpointCursor, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7f6b84531930, _M_refcount = {_M_pi = 0x7f6b84531920}}, <No data fields>},
|
checkpoint = 0x7f6b224c0400}
|
numItemsExpelled = 7326
|
queuedItemsMemReleased = 1093838
|
estimatedMemRecovered = <optimized out>
|
#9 0x00000000007e604c in CheckpointMemRecoveryTask::attemptItemExpelling (this=<optimized out>)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/checkpoint_remover.cc:123
|
vbid = {vbid = 514}
|
vb = {<std::__shared_ptr<VBucket, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<VBucket, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7f6b72dd6f00, _M_refcount = {_M_pi = 0x7f6b72fccca0}}, <No data fields>}
|
expelResult = <optimized out>
|
it = <error reading variable>
|
__for_range = @0x7f6bce7ea230: {<std::_Vector_base<std::pair<Vbid, unsigned long>, std::allocator<std::pair<Vbid, unsigned long> > >> = {
|
_M_impl = {<std::allocator<std::pair<Vbid, unsigned long> >> = {<__gnu_cxx::new_allocator<std::pair<Vbid, unsigned long> >> = {<No data fields>}, <No data fields>}, <std::_Vector_base<std::pair<Vbid, unsigned long>, std::allocator<std::pair<Vbid, unsigned long> > >::_Vector_impl_data> = {
|
_M_start = 0x7f6af5ec9000, _M_finish = 0x7f6af5ecb000, _M_end_of_storage = 0x7f6af5ecb000}, <No data fields>}}, <No data fields>}
|
__for_begin = <optimized out>
|
__for_end = <optimized out>
|
|
bucket = <error reading variable>
|
vbuckets = {<std::_Vector_base<std::pair<Vbid, unsigned long>, std::allocator<std::pair<Vbid, unsigned long> > >> = {
|
_M_impl = {<std::allocator<std::pair<Vbid, unsigned long> >> = {<__gnu_cxx::new_allocator<std::pair<Vbid, unsigned long> >> = {<No data fields>}, <No data fields>}, <std::_Vector_base<std::pair<Vbid, unsigned long>, std::allocator<std::pair<Vbid, unsigned long> > >::_Vector_impl_data> = {
|
_M_start = 0x7f6af5ec9000, _M_finish = 0x7f6af5ecb000, _M_end_of_storage = 0x7f6af5ecb000}, <No data fields>}}, <No data fields>}
|
#10 0x00000000007e6e18 in CheckpointMemRecoveryTask::runInner (this=0x7f6b75c1f3d0)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/checkpoint_remover.cc:265
|
phosphor_internal_category_enabled_205 = {_M_b = {_M_p = 0x0}, static is_always_lock_free = <optimized out>}
|
phosphor_internal_category_enabled_temp_205 = <optimized out>
|
phosphor_internal_tpi_205 = {category = 0x0, name = 0x0, type = phosphor::TraceEventType::AsyncStart, argument_names = {_M_elems = {0x0,
|
0x0}}, argument_types = {_M_elems = {phosphor::TraceArgumentType::is_bool, phosphor::TraceArgumentType::is_bool}}}
|
phosphor_internal_guard_205 = {tpi = 0x1081a80 <CheckpointMemRecoveryTask::runInner()::phosphor_internal_tpi_205>, enabled = true,
|
--Type <RET> for more, q to quit, c to continue without paging--
|
arg1 = {<No data fields>}, arg2 = {<No data fields>}, start = {__d = {__r = 3155132988588011}}}
|
bucket = <error reading variable>
|
wasAboveBackfillThreshold = false
|
onReturn = <optimized out>
|
bytesToFree = 302308378
|
#11 0x0000000000abbd79 in GlobalTask::execute (this=0x7f6b75c1f3d0, threadName=...)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/globaltask.cc:98
|
guard = {previous = 0x0}
|
executedAt = <optimized out>
|
scheduleOverhead = <optimized out>
|
start = <optimized out>
|
runAgain = <optimized out>
|
end = <optimized out>
|
runtime = <optimized out>
|
#12 0x0000000000ab543a in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const (__closure=0x7f6bce7ea630)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:309
|
runAgain = <optimized out>
|
proxy = <error reading variable>
|
#13 0x0000000000abd12e in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7f6bce7ea630)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/cancellable_cpu_executor.cc:42
|
fn = @0x7f6bce7ea630: {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {big = 0x7f6b774e3950, tiny = {
|
__data = "P9Nwk\177\000\000 \247~\316k\177\000\000\000\000\000\000\000\000\000\000@\346\326\356k\177\000\000\001\000\000\000\000\000\000\000\000\035\024\002\000\000\000", __align = {<No data fields>}}},
|
call_ = 0xab5970 <folly::detail::function::FunctionTraits<void ()>::callSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Data&)>,
|
exec_ = 0xab3e60 <folly::detail::function::execSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Op, folly::detail::function::Data*, folly::detail::function::Data)>}
|
#14 operator() (__closure=<optimized out>)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/cancellable_cpu_executor.cc:42
|
task = {storage_ = {{emptyState = -48 '\320', value = {task = 0x7f6b75c1f3d0,
|
func = {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {big = 0x7f6b774e3950, tiny = {
|
__data = "P9Nwk\177\000\000 \247~\316k\177\000\000\000\000\000\000\000\000\000\000@\346\326\356k\177\000\000\001\000\000\000\000\000\000\000\000\035\024\002\000\000\000", __align = {<No data fields>}}},
|
call_ = 0xab5970 <folly::detail::function::FunctionTraits<void ()>::callSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Data&)>,
|
exec_ = 0xab3e60 <folly::detail::function::execSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Op, folly::detail::function::Data*, folly::detail::function::Data)>}}}, hasValue = true}}
|
this = <optimized out>
|
#15 0x0000000000c1b240 in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7f6bce7ea820)
|
at /home/couchbase/jenkins/cbdeps-ws/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:416
|
--Type <RET> for more, q to quit, c to continue without paging--
|
fn = @0x7f6bce7ea820: {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {big = 0x7f6beed0a800, tiny = {
|
__data = "\000\250\320\356k\177\000\000\320\367.\362k\177\000\000\060\000\000\000\000\000\000\000\301\223\000\000\000\000\000\000H\000\000\000\000\000\000\000\360\250~\316k\177\000", __align = {<No data fields>}}},
|
call_ = 0xabd4b0 <folly::detail::function::FunctionTraits<void()>::callSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Data &)>,
|
exec_ = 0xabca60 <folly::detail::function::execSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Op, folly::detail::function::Data *, folly::detail::function::Data *)>}
|
fn = <optimized out>
|
|
#16 folly::ThreadPoolExecutor::runTask (this=this@entry=0x7f6beed0a900, thread=..., task=...)
|
at /home/couchbase/jenkins/cbdeps-ws/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/ThreadPoolExecutor.cpp:97
|
rctx = {
|
prev_ = {<std::__shared_ptr<folly::RequestContext, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<folly::RequestContext, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, <No data fields>}}
|
startTime = {__d = {__r = 3155132988580722}}
|
stats = {expired = false, waitTime = {__r = 10766854}, runTime = {__r = 0}, enqueueTime = {__d = {__r = 3155132977813868}}, requestId = 0}
|
#17 0x0000000000c05cda in folly::CPUThreadPoolExecutor::threadRun (this=0x7f6beed0a900, thread=...)
|
at /home/couchbase/jenkins/cbdeps-ws/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/CPUThreadPoolExecutor.cpp:265
|
task = {storage_ = {{emptyState = 0 '\000', value = {<folly::ThreadPoolExecutor::Task> = {
|
func_ = {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {big = 0x7f6beed0a800, tiny = {
|
__data = "\000\250\320\356k\177\000\000\320\367.\362k\177\000\000\060\000\000\000\000\000\000\000\301\223\000\000\000\000\000\000H\000\000\000\000\000\000\000\360\250~\316k\177\000", __align = {<No data fields>}}},
|
call_ = 0xabd4b0 <folly::detail::function::FunctionTraits<void()>::callSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Data &)>,
|
exec_ = 0xabca60 <folly::detail::function::execSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Op, folly::detail::function::Data *, folly::detail::function::Data *)>}, enqueueTime_ = {__d = {__r = 3155132977813868}},
|
expiration_ = {__r = 0}, expireCallback_ = {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {
|
big = 0x93c1, tiny = {
|
__data = "\301\223\000\000\000\000\000\000K\301\246", '\000' <repeats 13 times>, "_>\016\362k\177\000\000p\332\376\316k\177\000\000@\326.\362k\177\000",
|
|
__align = {<No data fields>}}}, call_ = 0x466c57
|
<folly::detail::function::FunctionTraits<void ()>::uninitCall(folly::detail::function::Data&)>, exec_ = 0x0},
|
context_ = {<std::__shared_ptr<folly::RequestContext, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<folly::RequestContext, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, <No data fields>}}, poison = false,
|
priority_ = 0 '\000', queueObserverPayload_ = 140101544187152}}, hasValue = true}}
|
guard = {list_ = {forbid = true,
|
prev = 0x0, curr = {name = {static npos = <optimized out>, b_ = 0xce1613 "CPUThreadPoolExecutor",
|
e_ = 0xce1628 ""}}}}
|
#18 0x0000000000c1e1f9 in std::__invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__t=<optimized out>, __f=<optimized out>)
|
at /usr/local/include/c++/7.3.0/bits/invoke.h:73
|
No locals.
|
#19 std::__invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_pt--Type <RET> for more, q to quit, c to continue without paging--
|
r<folly::ThreadPoolExecutor::Thread>&> (__fn=<optimized out>) at /usr/local/include/c++/7.3.0/bits/invoke.h:95
|
No locals.
|
#20 std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (__args=..., this=<optimized out>)
|
at /usr/local/include/c++/7.3.0/functional:467
|
No locals.
|
#21 std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)>::operator()<, void>() (this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:551
|
No locals.
|
#22 folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...)
|
at /home/couchbase/jenkins/cbdeps-ws/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:401
|
fn = <optimized out>
|
#23 0x0000000000ab5134 in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7f6beecd3c80)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:49
|
fn = <error reading variable>
|
#24 CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}::operator()() (__closure=0x7f6beecd3c80)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:49
|
threadNameOpt = {storage_ = {{emptyState = -128 '\200', value = {static npos = 18446744073709551615,
|
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
|
_M_p = 0x7f6bce7ea980 "NonIoPool1"}, _M_string_length = 10, {_M_local_buf = "NonIoPool1\000\000\000\000\000",
|
_M_allocated_capacity = 8029725099528449870}}}, hasValue = true}}
|
|
func = <error reading variable func (Cannot access memory at address 0x7f6beecd3c80)>
|
func = <optimized out>
|
threadNameOpt = <optimized out>
|
#25 folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:401
|
fn = <error reading variable>
|
#26 0x00007f6bf049fd40 in std::execute_native_thread_routine (__p=0x7f6beec293c0)
|
at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/thread.cc:80
|
__t = <optimized out>
|
#27 0x00007f6bf20a5fa3 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
|
No symbol table info available.
|
#28 0x00007f6beff6e06f in clone () from /lib/x86_64-linux-gnu/libc.so.6
|
No symbol table info available.
|
QE-TEST:
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.86484.ini bucket_storage=magma,rerun=false,GROUP=P0;kill,randomize_value=true,doc_size=256,bucket_eviction_policy=fullEviction,replicas=1,nodes_init=4,enable_dp=false,collect_pcaps=True,get-cbcollect-info=True,autoCompactionDefined=true,bucket_history_retention_seconds=600,bucket_history_retention_bytes=6000000000,upgrade_version=7.2.0-5318 -t storage.magma.magma_compaction.MagmaCompactionTests.test_crash_during_compaction,num_items=30000000,doc_size=256,graceful=False,doc_ops=update:expiry,replicas=1,GROUP=P0;kill'
|
Job: http://qe-jenkins1.sc.couchbase.com/job/test_suite_executor-TAF/24359/consoleFull
Issue | Resolution |
In rare cases, after a failover or memcached restart, a replica rollback while under memory pressure might have caused a crash in the Data Service. | Memory pressure recovery logic (Item expelling) is now skipped when replica rollback is in progress. |