Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48925

Tasks can be scheduled after owning KVBucket destroyed resulting in memcached termination

    XMLWordPrintable

Details

    • Triaged
    • 1
    • Yes
    • KV 2021-Oct-21

    Description

      Seen during TSan CV job (link TBD) during "test access scanner".

      Steps to reproduce

      • Build with TSan (linux)
      • Run ep_testsuite (in a loop; can take a number of iterations to hit the issue):

        gdb --args "/home/couchbase/server/build-tsan/kv_engine/ep_testsuite" "-E" "ep" "-v" "-e" "compression_mode=active;item_eviction_policy=full_eviction;dbname=./ep_testsuite.full_eviction.comp_active.db" -C 35 -L
        catch throw
        run
        

      Result

      Thread 93 "SchedulerPool0" hit Catchpoint 1 (exception thrown), __cxxabiv1::__cxa_throw (obj=obj@entry=0x7b2400004910, tinfo=0x11dd358 <typeinfo for std::out_of_range@@GLIBCXX_3.4>, dest=0x7ffff4cc8680 <std::out_of_range::~out_of_range()>)
          at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:80
      80  /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc: No such file or directory.
      (gdb) bt
      #0  __cxxabiv1::__cxa_throw (obj=obj@entry=0x7b2400004910, tinfo=0x11dd358 <typeinfo for std::out_of_range@@GLIBCXX_3.4>, dest=0x7ffff4cc8680 <std::out_of_range::~out_of_range()>) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:80
      #1  0x00007ffff4cab7d2 in std::__throw_out_of_range (__s=__s@entry=0xe01728 "_Map_base::at") at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/functexcept.cc:82
      #2  0x000000000064d5ad in std::__detail::_Map_base<Taskable const*, std::pair<Taskable const* const, TaskOwner>, std::allocator<std::pair<Taskable const* const, TaskOwner> >, std::__detail::_Select1st, std::equal_to<Taskable const*>, std::hash<Taskable const*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>, true>::at (__k=<optimized out>, this=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/hashtable_policy.h:433
      #3  std::unordered_map<Taskable const*, TaskOwner, std::hash<Taskable const*>, std::equal_to<Taskable const*>, std::allocator<std::pair<Taskable const* const, TaskOwner> > >::at (__k=<optimized out>, this=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/unordered_map.h:1001
      #4  FollyExecutorPool::State::scheduleTask (task=..., pool=..., executor=..., this=<optimized out>) at /home/couchbase/server/kv_engine/executor/folly_executorpool.cc:415
      #5  operator() (__closure=0x7fffe51ac080) at /home/couchbase/server/kv_engine/executor/folly_executorpool.cc:929
      #6  void folly::detail::function::FunctionTraits<void ()>::callSmall<FollyExecutorPool::schedule(std::shared_ptr<GlobalTask>)::{lambda()#2}>(folly::detail::function::Data&) () at tlm/deps/folly.exploded/include/folly/Function.h:387
      #7  0x0000000000da8d4c in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7fffe51ac080) at folly/Function.h:416
      #8  folly::EventBase::runInEventBaseThreadAndWait(folly::Function<void ()>)::$_9::operator()() (this=<optimized out>) at folly/io/async/EventBase.cpp:671
      ...
       
      (gdb) t 1
      [Switching to thread 1 (Thread 0x7ffff7fe1f40 (LWP 61764))]
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      38  ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
      (gdb) bt
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      #1  0x0000000000d5dc14 in folly::detail::(anonymous namespace)::nativeFutexWaitImpl (addr=<optimized out>, expected=<optimized out>, absSystemTime=0x0, absSteadyTime=<optimized out>, waitMask=<optimized out>) at folly/detail/Futex.cpp:123
      #2  folly::detail::futexWaitImpl (futex=0x7fffffffd928, expected=2, absSystemTime=0x0, absSteadyTime=0x7fffffffd890, waitMask=4294967295) at folly/detail/Futex.cpp:253
      #3  0x0000000000d77220 in folly::detail::futexWaitImpl<std::atomic<unsigned int> const, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > > (futex=0x7fffffffd928, expected=2, deadline=..., waitMask=4294967295)
          at folly/detail/Futex-inl.h:85
      #4  folly::detail::futexWaitUntil<std::atomic<unsigned int>, std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (futex=0x7fffffffd928, expected=2, deadline=..., waitMask=4294967295) at folly/detail/Futex-inl.h:124
      #5  folly::detail::MemoryIdler::futexWaitPreIdle<std::atomic<unsigned int>, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (_ret=@0x7fffffffd8dc: 32767, fut=..., 
          expected=2, deadline=..., waitMask=4294967295, idleTimeout=..., stackToRetain=1024, timeoutVariationFrac=0.5) at folly/detail/MemoryIdler.h:194
      #6  0x0000000000d929ed in folly::detail::MemoryIdler::futexWaitUntil<std::atomic<unsigned int>, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (fut=..., expected=2, 
          deadline=..., waitMask=4294967295, idleTimeout=..., stackToRetain=1024, timeoutVariationFrac=0.5) at folly/detail/MemoryIdler.h:151
      #7  folly::Baton<true, std::atomic>::tryWaitSlow<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7fffffffd928, deadline=..., opt=...) at folly/synchronization/Baton.h:308
      #8  0x0000000000da7890 in folly::Baton<true, std::atomic>::wait (this=0x7fffffffd928, opt=...) at folly/synchronization/Baton.h:178
      #9  folly::EventBase::runInEventBaseThreadAndWait(folly::Function<void ()>) (this=this@entry=0x7b5400140280, fn=...) at folly/io/async/EventBase.cpp:673
      #10 0x0000000000649bf3 in FollyExecutorPool::schedule(std::shared_ptr<GlobalTask>) () at /opt/gcc-10.2.0/include/c++/10.2.0/new:175
      #11 0x000000000084271b in EPVBucket::scheduleDeferredDeletion(EventuallyPersistentEngine&) () at /opt/gcc-10.2.0/include/c++/10.2.0/ext/atomicity.h:100
      #12 0x00000000006dfe7a in VBucket::DeferredDeleter::operator()(VBucket*) const () at /home/couchbase/server/kv_engine/engines/ep/src/vbucket.cc:3990
      #13 0x000000000086f874 in std::_Sp_counted_deleter<EPVBucket*, VBucket::DeferredDeleter, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7b0800006ae0) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:453
      #14 0x0000000000a7230c in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7b0800006ae0) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:151
      #15 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7b0800006ae0) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:151
      #16 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7b44000515d8, __in_chrg=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:733
      #17 std::__shared_ptr<VBucket, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7b44000515d0, __in_chrg=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:1183
      #18 std::shared_ptr<VBucket>::~shared_ptr (this=0x7b44000515d0, __in_chrg=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr.h:121
      #19 PagingVisitor::~PagingVisitor (this=0x7b4400051540, __in_chrg=<optimized out>) at /home/couchbase/server/kv_engine/engines/ep/src/paging_visitor.h:39
      #20 PagingVisitor::~PagingVisitor (this=0x7b4400051540, __in_chrg=<optimized out>) at /home/couchbase/server/kv_engine/engines/ep/src/paging_visitor.h:39
      #21 0x00000000007c3abb in std::default_delete<InterruptableVBucketVisitor>::operator() (__ptr=0x7b4400051540, this=0x7b3800016e30) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/unique_ptr.h:79
      #22 std::unique_ptr<InterruptableVBucketVisitor, std::default_delete<InterruptableVBucketVisitor> >::~unique_ptr (this=0x7b3800016e30, __in_chrg=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/unique_ptr.h:361
      #23 VBCBAdaptor::~VBCBAdaptor (this=0x7b3800016dd0, __in_chrg=<optimized out>) at /home/couchbase/server/kv_engine/engines/ep/src/kv_bucket.h:46
      #24 0x00000000007c11c0 in __gnu_cxx::new_allocator<VBCBAdaptor>::destroy<VBCBAdaptor> (__p=0x7b3800016dd0, this=0x7b3800016dd0) at /opt/gcc-10.2.0/include/c++/10.2.0/ext/new_allocator.h:154
      #25 std::allocator_traits<std::allocator<VBCBAdaptor> >::destroy<VBCBAdaptor> (__p=0x7b3800016dd0, __a=...) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/alloc_traits.h:531
      #26 std::_Sp_counted_ptr_inplace<VBCBAdaptor, std::allocator<VBCBAdaptor>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=0x7b3800016dc0) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:560
      #27 0x000000000081968a in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7b3800016dc0) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:151
      #28 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7b3800016dc0) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:151
      #29 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:733
      #30 std::__shared_ptr<GlobalTask, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:1183
      #31 std::__shared_ptr<GlobalTask, (__gnu_cxx::_Lock_policy)2>::reset (this=0x7b4000008880) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:1301
      #32 EventuallyPersistentEngine::waitForTasks(std::vector<std::shared_ptr<GlobalTask>, std::allocator<std::shared_ptr<GlobalTask> > >&) () at /home/couchbase/server/kv_engine/engines/ep/src/ep_engine.cc:6752
      #33 0x000000000082396f in EventuallyPersistentEngine::destroyInner(bool) () at /home/couchbase/server/kv_engine/engines/ep/src/ep_engine.cc:2135
      #34 0x0000000000823c1f in EventuallyPersistentEngine::destroy(bool) () at /home/couchbase/server/kv_engine/engines/ep/src/ep_engine.cc:205
      #35 0x00000000006b4b42 in MockEngine::destroy(bool) () at ../kv_engine/programs/engine_testapp/mock_engine.cc:127
      #36 0x00000000004fdf40 in MockTestHarness::destroy_bucket (force=false, handle=<optimized out>, this=0x11ee320 <harness>) at /home/couchbase/server/kv_engine/programs/engine_testapp/engine_testapp.cc:454
      #37 execute_test (default_cfg=0x7fffffffe70d "compression_mode=active;item_eviction_policy=full_eviction;dbname=./ep_testsuite.full_eviction.comp_active.db", test=...) at /home/couchbase/server/kv_engine/programs/engine_testapp/engine_testapp.cc:454
      #38 main () at /home/couchbase/server/kv_engine/programs/engine_testapp/engine_testapp.cc:673
      #39 0x00007ffff4281bf7 in __libc_start_main (main=0x4fb9a0 <main>, argc=9, argv=0x7fffffffe468, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe458) at ../csu/libc-start.c:310
      #40 0x000000000052a27a in _start () at /opt/gcc-10.2.0/include/c++/10.2.0/bits/char_traits.h:322
      

      Analysis

      This is a race-condition which if encountered could result on an exception being thrown on a background thread (and memcached terminating) during Bucket delete.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-48925
          # Subject Branch Project Status CR V

          Activity

            People

              drigby Dave Rigby (Inactive)
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty