Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51430

[Magma] Memcached crashed in magma::KVStoreSet::KVStoreInstance::Destroy(unsigned int, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)

    XMLWordPrintable

Details

    Description

      STEPS TO RECREATE:
      DISK FULL TEST

      1. Create a 4 node cluster
      2. Create 5 million items (doc size = 2048) and replicas =1
      3. Fill entire disk , ( "fallocate -l <space left on disk> <file_name>", e.g "fallocate -l 84716M /data/full_disk_84716MB_1647101247.94")
      4. After Disk is full, Start doc ops (create docs) until ep_data_write_failed > 0(ensured using cbstats)
      5. Kill memcached on all nodes (kill -9 $(pgrep memcached) Time difference between sigKill on each node was three seconds
      6. Memcached crashed in magma::KVStoreSet::KVStoreInstance::Destroy(unsigned int, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>) (this=0x7fcac33e0de0, kvsRev=<optimized out>, destroyCallback=...)
        (Observed on node 172.23.104.76)

      BackTrace:

      (gdb) bt full
      #0  0x00007fcb87720387 in raise () from /lib64/libc.so.6
      No symbol table info available.
      #1  0x00007fcb87721a78 in abort () from /lib64/libc.so.6
      No symbol table info available.
      #2  0x00007fcb8806b63c in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/vterminate.cc:95
              terminating = false
              t = <optimized out>
      #3  0x0000000000b3210b in backtrace_terminate_handler() ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/utilities/terminate_handler.cc:88
      No locals.
      #4  0x00007fcb880768f6 in __cxxabiv1::__terminate(void (*)()) () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:48
      No locals.
      #5  0x00007fcb88076961 in std::terminate () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:58
      No locals.
      #6  0x00007fcb88076bf4 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7fcb18000940, tinfo=0x107f4e0 <typeinfo for std::runtime_error>,
          dest=0x444ee0 <_ZNSt13runtime_errorD1Ev@plt>) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:95
              globals = <optimized out>
              header = 0x7fcb180008c0
      #7  0x00000000004fecd0 in magma::KVStoreSet::KVStoreInstance::Destroy(unsigned int, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>) (this=0x7fcac33e0de0, kvsRev=<optimized out>, destroyCallback=...)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore_set.cc:254
              found = <optimized out>
              lock = {_M_device = @0x7fcac33e0de0}
              fs = {
                MakeFile = {<std::_Maybe_unary_or_binary_function<std::unique_ptr<magma::File, std::default_delete<magma::File> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>> = {<std::unary_function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unique_ptr<magma::File, std::default_delete<magma::File> > >> = {<No data fields>}, <No data fields>}, <std::_Function_base> = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x7fcaa554ade0, _M_const_object = 0x7fcaa554ade0,
                        _M_function_pointer = 0x7fcaa554ade0,
                        _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x7fcaa554ade0, this adjustment 140510450597616}, _M_pod_data = "\340\255T\245\312\177\000\000\360\276~#\313\177\000"}, _M_manager = 0x9974b0
           <_ZNSt17_Function_handlerIFSt10unique_ptrIN5magma4FileESt14default_deleteIS2_EERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEZNS1_23FileSystemWithHistogramC4ENS1_10FileSystemEPSt6vectorIS0_INS1_14MagmaFileStatsES3_ISI_EESaISK_EEmEUlSD_E_E10_M_managerERSt9_Any_dataRKSQ_St18_Manager_operation>},
                  _M_invoker = 0x997700 <_ZNSt17_Function_handlerIFSt10unique_ptrIN5magma4FileESt14default_deleteIS2_EERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEZNS1_23FileSystemWithHistogramC4ENS1_10FileSystemEPSt6vectorIS0_INS1_14MagmaFileStatsES3_ISI_EESaISK_EEmEUlSD_E_E9_M_invokeERKSt9_Any_dataSD_>},
                MakeDirectory = {<std::_Maybe_unary_or_binary_function<std::unique_ptr<magma::Directory, std::default_delete<magma::Directory> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>> = {<std::unary_function<std::__cxx11::basic_string<char, std::char_traits<char>---Type <return> to continue, or q <return> to quit---
      , std::allocator<char> > const&, std::unique_ptr<magma::Directory, std::default_delete<magma::Directory> > >> = {<No data fields>}, <No data fields>}, <std::_Function_base> = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x9fb460
           <magma::DefaultDirectoryConstructor(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>,
                        _M_const_object = 0x9fb460 <magma::DefaultDirectoryConstructor(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>,
                        _M_function_pointer = 0x9fb460 <magma::DefaultDirectoryConstructor(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>,
                        _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x9fb460 <magma::DefaultDirectoryConstructor(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>, this adjustment 140505560121368},
                      _M_pod_data = "`\264\237\000\000\000\000\000\030\000\000\000\312\177\000"},
                    _M_manager = 0x997e00 <std::_Function_handler<std::unique_ptr<magma::Directory, std::default_delete<magma::Directory> > (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::unique_ptr<magma::Directory, std::default_delete<magma::Directory> > (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)>},
                  _M_invoker = 0x997d60 <std::_Function_handler<std::unique_ptr<magma::Directory, std::default_delete<magma::Directory> > (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), std::unique_ptr<magma::Directory, std::default_delete<magma::Directory> > (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>},
                RemoveAll = {<std::_Maybe_unary_or_binary_function<magma::Status, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool>> = {<std::binary_function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, magma::Status>> = {<No data fields>}, <No data fields>}, <std::_Function_base> = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {
                        _M_object = 0x993f30
           <magma::DefaultFileRemover(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)>,
                        _M_const_object = 0x993f30 <magma::DefaultFileRemover(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)>,
                        _M_function_pointer = 0x993f30 <magma::DefaultFileRemover(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)>,
                        _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x993f30 <magma::DefaultFileRemover(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)>, this adjustment 139849238589488},
                      _M_pod_data = "0?\231\000\000\000\000\000\060\060\060\060\061\177\000"},
                    _M_manager = 0x997e40 <std::_Function_handler<magma::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool), magma::Status (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)>},
                  _M_invoker = 0x997d80 <std::_Function_handler<magma::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool), magma::Status (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)>::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool&&)>},
                Rename = {<std::_Maybe_unary_or_binary_function<magma::Status, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>> = {<std::binary_function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ma---Type <return> to continue, or q <return> to quit---
      gma::Status>> = {<No data fields>}, <No data fields>}, <std::_Function_base> = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {
                      _M_unused = {
                        _M_object = 0x993060 <magma::DefaultFileRenamer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>,
                        _M_const_object = 0x993060 <magma::DefaultFileRenamer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>,
                        _M_function_pointer = 0x993060 <magma::DefaultFileRenamer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>,
                        _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x993060 <magma::DefaultFileRenamer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>, this adjustment 4098}, _M_pod_data = "`0\231\000\000\000\000\000\002\020\000\000\000\000\000"},
                    _M_manager = 0x997e80 <std::_Function_handler<magma::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), magma::Status (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)>},
                  _M_invoker = 0x997da0 <std::_Function_handler<magma::Status (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&), magma::Status (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>}}
              status = {s = {_M_t = {<std::__uniq_ptr_impl<magma::Status::state, std::default_delete<magma::Status::state> >> = {
                      _M_t = {<std::_Tuple_impl<0, magma::Status::state*, std::default_delete<magma::Status::state> >> = {<std::_Tuple_impl<1, std::default_delete<magma::Status::state> >> = {<std::_Head_base<1, std::default_delete<magma::Status::state>, true>> = {<std::default_delete<magma::Status::state>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, magma::Status::state*, false>> = {
                            _M_head_impl = 0x7fcb86169210}, <No data fields>}, <No data fields>}}, <No data fields>}}}
              path = {static npos = 18446744073709551615,
                _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
                  _M_p = 0x7fcaa554ae10 <Address 0x7fcaa554ae10 out of bounds>}, _M_string_length = 47, {
                  _M_local_buf = "/\000\000\000\000\000\000\000\360\276~#\313\177\000", _M_allocated_capacity = 47}}
              target = {<std::__shared_ptr<magma::KVStore, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<magma::KVStore, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7fcac35a7510, _M_refcount = {_M_pi = 0x7fcac35a7500}}, <No data fields>}
      #8  0x00000000009574de in magma::KVStoreSet::DestroyKVStore(unsigned short, unsigned int) () at /opt/gcc-10.2.0/include/c++/10.2.0/new:175
              ok = <optimized out>
              lock = <optimized out>
      #9  0x000000000091d63b in magma::Magma::Impl::DeleteKVStore(unsigned short, unsigned int) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:304
              status = {s = {_M_t = {<std::__uniq_ptr_impl<magma::Status::state, std::default_delete<magma::Status::state> >> = {
                      _M_t = {<std::_Tuple_impl<0, magma::Status::state*, std::default_delete<magma::Status::state> >> = {<std::_Tuple_impl<1, std::default_delete<magma::Status::state> >> = {<std::_Head_base<1, std::default_delete<magma::Status::state>, true>> = {<std::default_delete<magma::Status::state>> ---Type <return> to continue, or q <return> to quit---
      = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, magma::Status::state*, false>> = {
                            _M_head_impl = 0x0}, <No data fields>}, <No data fields>}}, <No data fields>}}}
              name = {static npos = 18446744073709551615,
                _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
                  _M_p = 0x7fcaab953a20 <Address 0x7fcaab953a20 out of bounds>}, _M_string_length = 25, {_M_local_buf = "\036", '\000' <repeats 14 times>,
                  _M_allocated_capacity = 30}}
      #10 0x000000000091d800 in magma::Magma::DeleteKVStore (this=<optimized out>, kvID=kvID@entry=198, kvsRev=kvsRev@entry=1)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:319
      No locals.
      #11 0x0000000000888381 in MagmaMemoryTrackingProxy::DeleteKVStore(unsigned short, unsigned int) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kvstore/magma-kvstore/magma-memory-tracking-proxy.cc:135
              domainGuard = {previous = cb::Primary}
      #12 0x000000000086adf1 in MagmaKVStore::delVBucket(Vbid, std::unique_ptr<KVStoreRevision, std::default_delete<KVStoreRevision> >) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/include/memcached/vbucket.h:62
              status = {s = {_M_t = {<std::__uniq_ptr_impl<magma::Status::state, std::default_delete<magma::Status::state> >> = {
                      _M_t = {<std::_Tuple_impl<0, magma::Status::state*, std::default_delete<magma::Status::state> >> = {<std::_Tuple_impl<1, std::default_delete<magma::Status::state> >> = {<std::_Head_base<1, std::default_delete<magma::Status::state>, true>> = {<std::default_delete<magma::Status::state>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, magma::Status::state*, false>> = {
                            _M_head_impl = 0x7fcabc3f4548}, <No data fields>}, <No data fields>}}, <No data fields>}}}
      #13 0x000000000073a915 in VBucketMemoryAndDiskDeletionTask::run() ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/vbucket.h:404
              phosphor_internal_category_enabled_90 = {_M_b = {_M_p = 0x0}, static is_always_lock_free = <optimized out>}
              phosphor_internal_category_enabled_temp_90 = <optimized out>
              phosphor_internal_tpi_90 = {category = 0x0, name = 0x0, type = phosphor::AsyncStart, argument_names = {_M_elems = {0x0, 0x0}},
                argument_types = {_M_elems = {phosphor::is_bool, phosphor::is_bool}}}
              phosphor_internal_guard_90 = {tpi = 0x1066e00 <VBucketMemoryAndDiskDeletionTask::run()::phosphor_internal_tpi_90>, enabled = true,
                arg1 = 198, arg2 = {<No data fields>}, start = {__d = {__r = 138251197190503}}}
              start = <optimized out>
              elapsed = <optimized out>
              wallTime = <optimized out>
      #14 0x0000000000aa0b99 in GlobalTask::execute(std::basic_string_view<char, std::char_traits<char> >) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/globaltask.cc:98
       
              guard = {previous = 0x0}
              start = <optimized out>
              runAgain = <optimized out>
      #15 0x0000000000a9a25a in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const (__closure=0x7fcb237ec650)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:309
              runAgain = <optimized out>
              proxy = @0x7fcb7809bce0: <error reading variable>
      ---Type <return> to continue, or q <return> to quit---
      #16 0x0000000000aa1f3e in operator() (this=0x7fcb237ec650)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/cancellable_cpu_executor.cc:42
              fn = @0x7fcb237ec650: {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {big = 0x7fcb7809bce0, tiny = {
                    __data = "\340\274\tx\313\177\000\000@\307~#\313\177\000\000\000\000\000\000\000\000\000\000`\221s\320\312\177\000\000\025\000\000\000\000\000\000\000m\240\276\000\000\000\000", __align = {<No data fields>}}},
                call_ = 0xa9a790 <folly::detail::function::FunctionTraits<void ()>::callSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Data&)>,
                exec_ = 0xa98c80 <folly::detail::function::execSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Op, folly::detail::function::Data*, folly::detail::function::Data)>}
      #17 CancellableCPUExecutor::add(GlobalTask*, folly::Function<void ()>)::{lambda()#1}::operator()() const ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/cancellable_cpu_executor.cc:42
              task = {storage_ = {{emptyState = -112 '\220', value = {task = 0x7fc928edf690,
                      func = {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {big = 0x7fcb7809bce0, tiny = {
                            __data = "\340\274\tx\313\177\000\000@\307~#\313\177\000\000\000\000\000\000\000\000\000\000`\221s\320\312\177\000\000\025\000\000\000\000\000\000\000m\240\276\000\000\000\000", __align = {<No data fields>}}},
                        call_ = 0xa9a790 <folly::detail::function::FunctionTraits<void ()>::callSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Data&)>,
                        exec_ = 0xa98c80 <folly::detail::function::execSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Op, folly::detail::function::Data*, folly::detail::function::Data)>}}}, hasValue = true}}
              this = <optimized out>
      #18 0x0000000000bf9640 in operator() (this=0x7fcb237ec840)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:416
              fn = @0x7fcb237ec840: {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {big = 0x7fcb86151000, tiny = {
                    __data = "\000\020\025\206\313\177\000\000\350\000\000\000\000\000\000\000\330\001\000\000\000\000\000\000\326\017\\\000\000\000\000\000\270\310~#\313\177\000\000\320GN\212\313\177\000", __align = {<No data fields>}}},
                call_ = 0xaa22c0 <folly::detail::function::FunctionTraits<void()>::callSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Data &)>,
                exec_ = 0xaa1870 <folly::detail::function::execSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Op, folly::detail::function::Data *, folly::detail::function::Data *)>}
      #19 folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) (
          this=this@entry=0x7fcb86151100, thread=...,
          task=task@entry=<unknown type in /usr/lib/debug/opt/couchbase/bin/memcached-7.1.0-2475.x86_64.debug, CU 0xa4065bd, DIE 0xa48a4f2>)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/ThreadPoolExecutor.cpp:97
              rctx = {
                prev_ = {<std::__shared_ptr<folly::RequestContext, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<folly::RequestContext, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, <No data fields>}}
              startTime = {__d = {__r = 138251197182756}}
              stats = {expired = false, waitTime = {__r = 134987}, runTime = {__r = 0}, enqueueTime = {__d = {__r = 138251197047769}}, requestId = 0}
      ---Type <return> to continue, or q <return> to quit---
      #20 0x0000000000be40da in folly::CPUThreadPoolExecutor::threadRun (this=0x7fcb86151100, thread=...)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/CPUThreadPoolExecutor.cpp:265
              task = {storage_ = {{emptyState = 0 '\000', value = {<folly::ThreadPoolExecutor::Task> = {
                        func_ = {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {big = 0x7fcb86151000, tiny = {
                              __data = "\000\020\025\206\313\177\000\000\350\000\000\000\000\000\000\000\330\001\000\000\000\000\000\000\326\017\\\000\000\000\000\000\270\310~#\313\177\000\000\320GN\212\313\177\000", __align = {<No data fields>}}},
                          call_ = 0xaa22c0 <folly::detail::function::FunctionTraits<void()>::callSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Data &)>,
                          exec_ = 0xaa1870 <folly::detail::function::execSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Op, folly::detail::function::Data *, folly::detail::function::Data *)>}, enqueueTime_ = {__d = {__r = 138251197047769}},
                        expiration_ = {__r = 0}, expireCallback_ = {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {
                            big = 0x7fcb861691b0, tiny = {
                              __data = "\260\221\026\206\313\177\000\000\233\210\242", '\000' <repeats 13 times>, "\305\216-\212\313\177\000\000\000\300]\206\313\177\000\000@&N\212\313\177\000",
      __align = {<No data fields>}}}, call_ = 0x466e11
           <folly::detail::function::FunctionTraits<void ()>::uninitCall(folly::detail::function::Data&)>, exec_ = 0x0},
                        context_ = {<std::__shared_ptr<folly::RequestContext, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<folly::RequestContext, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x0, _M_refcount = {_M_pi = 0x0}}, <No data fields>}}, poison = false,
                      priority_ = 0 '\000', queueObserverPayload_ = 0}}, hasValue = true}}
              guard = {list_ = {forbid = true, prev = 0x0, curr = {name = {static npos = <optimized out>, b_ = 0xccba6b "CPUThreadPoolExecutor",
                      e_ = 0xccba80 ""}}}}
      #21 0x0000000000bfc5f9 in __invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__t=<optimized out>, __f=<optimized out>)
          at /usr/local/include/c++/7.3.0/bits/invoke.h:73
      No locals.
      #22 __invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__fn=<optimized out>) at /usr/local/include/c++/7.3.0/bits/invoke.h:95
      No locals.
      #23 __call<void, 0, 1> (__args=<optimized out>, this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:467
      No locals.
      #24 operator()<> (this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:551
      No locals.
      #25 folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:401
              fn = <optimized out>
      #26 0x0000000000a99f54 in operator() (this=0x7fcb865d8cc0)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:49
              fn = @0x7fcb865d8cc0: <error reading variable>
      ---Type <return> to continue, or q <return> to quit---
      #27 operator() (__closure=0x7fcb865d8cc0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:49
              threadNameOpt = {storage_ = {{emptyState = -96 '\240', value = {static npos = 18446744073709551615,
                      _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
                        _M_p = 0x7fcb237ec9a0 "AuxIoPool3"}, _M_string_length = 10, {_M_local_buf = "AuxIoPool3\000\000\000\000\000",
                        _M_allocated_capacity = 8029725099529106753}}}, hasValue = true}}
              func = <error reading variable func (Cannot access memory at address 0x7fcb865d8cc0)>
      #28 folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:401
              fn = @0x7fcb865d8cc0: <error reading variable>
      #29 0x00007fcb8809fd40 in execute_native_thread_routine () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/thread.cc:80
      No locals.
      #30 0x00007fcb89ea7ea5 in start_thread () from /lib64/libpthread.so.0
      No symbol table info available.
      #31 0x00007fcb877e88dd in clone () from /lib64/libc.so.6
      No symbol table info available.
      

      QE-TEST:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/ankush_temp_job.ini bucket_storage=magma,rerun=false,bucket_eviction_policy=fullEviction,randomize_value=True,enable_dp=false,GROUP=P0,get-cbcollect-info=True,upgrade_version=7.1.0-1671 -t storage.magma.magma_disk_full.MagmaDiskFull.test_crash_recovery_disk_full,nodes_init=4,num_items=5000000,doc_size=2048,sdk_timeout=60,replicas=1,GROUP=P0'
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          drigby Dave Rigby added a comment -

          Ankush Sharma Is this a regression? If so what is the last known good build?

          drigby Dave Rigby added a comment - Ankush Sharma Is this a regression? If so what is the last known good build?
          drigby Dave Rigby added a comment -

          Crash is caused by an uncaught exception on a background (NonIO) thread.

          Exception is being thrown from inside Magma (see frame #7):

          #7  0x00000000004fecd0 in magma::KVStoreSet::KVStoreInstance::Destroy(unsigned int, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>) (this=0x7fcac33e0de0, kvsRev=<optimized out>, destroyCallback=...)
              at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore_set.cc:254
          

                      ...
                      auto status = MarkKVStoreDeleted(path, fs);
                      if (!status) {
                          throw std::runtime_error(status.Message());
                      }
          

          This is called from {[MagmaKVStore::delVBucket}} via Magma::DeleteKVStore which returns a status code:

          void MagmaKVStore::delVBucket(Vbid vbid,
                                        std::unique_ptr<KVStoreRevision> kvstoreRev) {
              auto status = magma->DeleteKVStore(
                      vbid.get(),
                      static_cast<Magma::KVStoreRevision>(kvstoreRev->getRevision()));
              logger->info(
                      "MagmaKVStore::delVBucket DeleteKVStore {} kvstoreRev:{}. "
                      "status:{}",
                      vbid,
                      kvstoreRev->getRevision(),
                      status.String());
          }
          

          However, given the exception is not caught and propagated via a status code as expected, the exception escapes the calling thread and terminates KV-Engine.

          Assigning to Magma to look at why this error during delete was not returned via Magma::DeleteKVStore's return code.

          drigby Dave Rigby added a comment - Crash is caused by an uncaught exception on a background (NonIO) thread. Exception is being thrown from inside Magma (see frame #7): #7 0x00000000004fecd0 in magma::KVStoreSet::KVStoreInstance::Destroy(unsigned int, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>) (this=0x7fcac33e0de0, kvsRev=<optimized out>, destroyCallback=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore_set.cc:254 ... auto status = MarkKVStoreDeleted(path, fs); if (!status) { throw std::runtime_error(status.Message()); } This is called from {[MagmaKVStore::delVBucket}} via Magma::DeleteKVStore which returns a status code: void MagmaKVStore::delVBucket(Vbid vbid, std::unique_ptr<KVStoreRevision> kvstoreRev) { auto status = magma->DeleteKVStore( vbid.get(), static_cast <Magma::KVStoreRevision>(kvstoreRev->getRevision())); logger->info( "MagmaKVStore::delVBucket DeleteKVStore {} kvstoreRev:{}. " "status:{}" , vbid, kvstoreRev->getRevision(), status.String()); } However, given the exception is not caught and propagated via a status code as expected, the exception escapes the calling thread and terminates KV-Engine. Assigning to Magma to look at why this error during delete was not returned via Magma::DeleteKVStore 's return code.

          Dave Rigby

          This could be a regression. Last build on which I didn’t observe this issue was 7.1.0-2442. But this issue is not 100% reproducible on this build as well, I ran this test for 4-5 times, and this is the only instance when I hit this issue. So it might be existing on 7.1.0-2442 as well, But when I ran this test on 7.1.0-2442 I didn’t run in to this issue on 7.1.0-2442

          ankush.sharma Ankush Sharma added a comment - Dave Rigby This could be a regression. Last build on which I didn’t observe this issue was 7.1.0-2442. But this issue is not 100% reproducible on this build as well, I ran this test for 4-5 times, and this is the only instance when I hit this issue. So it might be existing on 7.1.0-2442 as well, But when I ran this test on 7.1.0-2442 I didn’t run in to this issue on 7.1.0-2442

          This is not a regression, but a rare case where file rename during kvstore destroy operation fails due to disk full and is treated as a fatal error. To address this issue we have to make the kvstore destroy method retriable.

          sarath Sarath Lakshman added a comment - This is not a regression, but a rare case where file rename during kvstore destroy operation fails due to disk full and is treated as a fatal error. To address this issue we have to make the kvstore destroy method retriable.

          Looking at magma-kvstore, MagmaKVStore::delVBucket calls Magma::DeleteKVStore and doesn't retry in case of a failure. Even though there is MagmaKVStore::pendingTasks(), it does not seem to be implemented to retry.

          Dave Rigby Could you take a look ?

          We have the changes for magma to return error status for DeleteKVStore ready. But once the API starts returning error code instead of exception, we need retrying from the magma-kvstore to complete the delete operation later.

          Given that it involves more changes, we could think of moving it out of Neo.

          sarath Sarath Lakshman added a comment - Looking at magma-kvstore, MagmaKVStore::delVBucket calls Magma::DeleteKVStore and doesn't retry in case of a failure. Even though there is MagmaKVStore::pendingTasks() , it does not seem to be implemented to retry. Dave Rigby Could you take a look ? We have the changes for magma to return error status for DeleteKVStore ready. But once the API starts returning error code instead of exception, we need retrying from the magma-kvstore to complete the delete operation later. Given that it involves more changes, we could think of moving it out of Neo.
          drigby Dave Rigby added a comment -

          Agreed, there's no retry for MagmaKVStore::delVBucket().

          I think we do want the exception to be caught (and status code propagated instead) for Neo - that avoids taking out the entire memcached process (and other unrelated Buckets) if there's an IO issue with a single vBucket.

          In terms of retrying the delete if it fails, I wanted to check exactly what Magma's behaviour is in this area. For CouchKVStore we do attempt to retry if the delete fails - which for couchstore is just a simple `unlink` call, which can fail on Windows if the couchstore file is still in use (for example an in-progress BGFetch) - hence adding the retry. In theory that could repeatedly to unlink forever, but it wouldn't affect any other Buckets.

          Additionally for CouchKVStore there's a revision ID for each vBucket which names the (single) file which holds the state for a vBucket - e.g. 10.couch.4 means the 4th revision of vBucket 10. When deleting a vbucket the revision is incremented, so if/when the vBucket file is re-created it will have revision N+1 - or 5 in this example.

          As such, it is safe to just retry in the background when a vBucket fails to be deleted.

          Q: For Magma, do we have similar functionality - i.e. is it safe to retry deleting say revision 4 of a vBucket even after revision 5 has been created?

          drigby Dave Rigby added a comment - Agreed, there's no retry for MagmaKVStore::delVBucket(). I think we do want the exception to be caught (and status code propagated instead) for Neo - that avoids taking out the entire memcached process (and other unrelated Buckets) if there's an IO issue with a single vBucket. In terms of retrying the delete if it fails, I wanted to check exactly what Magma's behaviour is in this area. For CouchKVStore we do attempt to retry if the delete fails - which for couchstore is just a simple `unlink` call, which can fail on Windows if the couchstore file is still in use (for example an in-progress BGFetch) - hence adding the retry. In theory that could repeatedly to unlink forever, but it wouldn't affect any other Buckets. Additionally for CouchKVStore there's a revision ID for each vBucket which names the (single) file which holds the state for a vBucket - e.g. 10.couch.4 means the 4th revision of vBucket 10. When deleting a vbucket the revision is incremented, so if/when the vBucket file is re-created it will have revision N+1 - or 5 in this example. As such, it is safe to just retry in the background when a vBucket fails to be deleted. Q: For Magma, do we have similar functionality - i.e. is it safe to retry deleting say revision 4 of a vBucket even after revision 5 has been created?

          Magma mimics the couchstore revisions model. When a new revision of a kvstore is created, the prior revision is hidden for new reads and writes as all read/writes go to the latest revision. The existing read operations like seq index scan continue to operate.

          DeleteKVStore operates on a kvstore revision. During the API call, magma performs an internal file rename operation (atomic to make sure that the revision is no longer valid and identified as invalid during crash recovery). Then it internally schedules full data files deletion in the background when the refcount of the kvstore becomes zero (once all other active readers finish). If the atomic file rename fails, the DeleteKVStore returns an IOError and we can retry the operation until it succeeds. Before the change on the magma side, we crash on atomic file rename failure.

          So it is safe to retry DeleteKVStore just like couchstore from magma-kvstore if we merge, https://review.couchbase.org/c/magma/+/172245.

          Another option is to always return success from DeleteKVStore API for the specific revision and magma internally retry the revision deletion. But, if kv-engine has any scenario where it recreates the same revision number after a Delete is successful, it can lead to problems. If it is a safe assumption, I can make a change to do the retry mechanism within magma itself.

          sarath Sarath Lakshman added a comment - Magma mimics the couchstore revisions model. When a new revision of a kvstore is created, the prior revision is hidden for new reads and writes as all read/writes go to the latest revision. The existing read operations like seq index scan continue to operate. DeleteKVStore operates on a kvstore revision. During the API call, magma performs an internal file rename operation (atomic to make sure that the revision is no longer valid and identified as invalid during crash recovery). Then it internally schedules full data files deletion in the background when the refcount of the kvstore becomes zero (once all other active readers finish). If the atomic file rename fails, the DeleteKVStore returns an IOError and we can retry the operation until it succeeds. Before the change on the magma side, we crash on atomic file rename failure. So it is safe to retry DeleteKVStore just like couchstore from magma-kvstore if we merge, https://review.couchbase.org/c/magma/+/172245 . Another option is to always return success from DeleteKVStore API for the specific revision and magma internally retry the revision deletion. But, if kv-engine has any scenario where it recreates the same revision number after a Delete is successful, it can lead to problems. If it is a safe assumption, I can make a change to do the retry mechanism within magma itself.
          drigby Dave Rigby added a comment -

          Thanks Sarath Lakshman.

          The only case where we could potentially re-use the same revision would be in some crash-restart scenarios, if Magma::GetKVStoreRevision returned the "wrong" value - via:

              magma->executeOnKVStoreList(
                      [this](const std::vector<magma::Magma::KVStoreID>& kvstores) {
                          cb::UseArenaMallocPrimaryDomain domainGuard;
                          for (auto kvid : kvstores) {
                              auto status = loadVBStateCache(Vbid(kvid), true);
                              ++st.numLoadedVb;
                              if (!status) {
                                  throw std::logic_error("MagmaKVStore vbstate vbid:" +
                                                         std::to_string(kvid) +
                                                         " not found."
                                                         " Status:" +
                                                         status.String());
                              }
                          }
                      });
          

          I don't really mind which approach we do - certainly we do log the status of DelVBucket so it is visible if it failed. If Magma is already performing an amount of revisionId management (and already does some cleanup in the background), then perhaps it is cleaner to delegate the retrying to Magma itself?

          drigby Dave Rigby added a comment - Thanks Sarath Lakshman . The only case where we could potentially re-use the same revision would be in some crash-restart scenarios, if Magma::GetKVStoreRevision returned the "wrong" value - via: magma->executeOnKVStoreList( [ this ]( const std::vector<magma::Magma::KVStoreID>& kvstores) { cb::UseArenaMallocPrimaryDomain domainGuard; for (auto kvid : kvstores) { auto status = loadVBStateCache(Vbid(kvid), true ); ++st.numLoadedVb; if (!status) { throw std::logic_error( "MagmaKVStore vbstate vbid:" + std::to_string(kvid) + " not found." " Status:" + status.String()); } } }); I don't really mind which approach we do - certainly we do log the status of DelVBucket so it is visible if it failed. If Magma is already performing an amount of revisionId management (and already does some cleanup in the background), then perhaps it is cleaner to delegate the retrying to Magma itself?
          drigby Dave Rigby added a comment -

          CC Ben Huddleston as he was looking at some similar-ish issues in when Warmup couldn't read a valid vbstate.

          drigby Dave Rigby added a comment - CC Ben Huddleston as he was looking at some similar-ish issues in when Warmup couldn't read a valid vbstate.

          Thanks Dave Rigby. I think we can go ahead with magma itself retrying as kv-engine always create monotonically increasing kvstore revisions. I will make the change.
          Regarding the crash recovery, magma deletes all the older kvstore revisions and retains only the latest revision for a kvID.

          sarath Sarath Lakshman added a comment - Thanks Dave Rigby . I think we can go ahead with magma itself retrying as kv-engine always create monotonically increasing kvstore revisions. I will make the change. Regarding the crash recovery, magma deletes all the older kvstore revisions and retains only the latest revision for a kvID.

          Lynn Straus Raju Suravarjjala Please approve this ticket for Neo. Sarath Lakshman has a fix to handle the fatal error in case of disk full and it is safe. Thanks.

          srinath.duvuru Srinath Duvuru added a comment - Lynn Straus Raju Suravarjjala Please approve this ticket for Neo. Sarath Lakshman has a fix to handle the fatal error in case of disk full and it is safe. Thanks.

          Approving after discussing with Raju Suravarjjala

          mihir.kamdar Mihir Kamdar (Inactive) added a comment - Approving after discussing with Raju Suravarjjala

          Build couchbase-server-7.1.0-2499 contains magma commit f6feedc with commit message:
          MB-51430 tests: Add unit test for i/o error during kvstore delete

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2499 contains magma commit f6feedc with commit message: MB-51430 tests: Add unit test for i/o error during kvstore delete

          Build couchbase-server-7.1.0-2499 contains magma commit 4b77e24 with commit message:
          MB-51430 magma: Implement internal retry for kvstore revision deletion

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2499 contains magma commit 4b77e24 with commit message: MB-51430 magma: Implement internal retry for kvstore revision deletion

          Build couchbase-server-7.1.0-2499 contains magma commit 32e9358 with commit message:
          MB-51430 magma: Make DeleteKVStore API retriable on i/o errors

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2499 contains magma commit 32e9358 with commit message: MB-51430 magma: Make DeleteKVStore API retriable on i/o errors

          Build couchbase-server-7.2.0-1019 contains magma commit f6feedc with commit message:
          MB-51430 tests: Add unit test for i/o error during kvstore delete

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1019 contains magma commit f6feedc with commit message: MB-51430 tests: Add unit test for i/o error during kvstore delete

          Build couchbase-server-7.2.0-1019 contains magma commit 4b77e24 with commit message:
          MB-51430 magma: Implement internal retry for kvstore revision deletion

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1019 contains magma commit 4b77e24 with commit message: MB-51430 magma: Implement internal retry for kvstore revision deletion

          Build couchbase-server-7.2.0-1019 contains magma commit 32e9358 with commit message:
          MB-51430 magma: Make DeleteKVStore API retriable on i/o errors

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1019 contains magma commit 32e9358 with commit message: MB-51430 magma: Make DeleteKVStore API retriable on i/o errors

          Ran many iterations of this test on builds 7.1.0-2506, 7.1.0-2512 and 7.1.0-2518. Not seeing this anymore.

          ankush.sharma Ankush Sharma added a comment - Ran many iterations of this test on builds 7.1.0-2506, 7.1.0-2512 and 7.1.0-2518. Not seeing this anymore.

          People

            ankush.sharma Ankush Sharma
            ankush.sharma Ankush Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty