Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-55806

[CDC] Memcached crashed in DefragmentVisitor::visit(HashTable::HashBucketLock const&, StoredValue&)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • 7.2.0
    • couchbase-bucket
    • 7.2.0-5211

    Description

      Steps To Recreate:

      1. Create a 3 node cluster
      2. Create a magma buckets with (bucket_history_retention_seconds=86400,bucket_history_retention_bytes=99636764160,) (vbuckets = 16, replicas = 2)
      3. Create 14 collections(total collection count is 15, including default collection)
      4. After creating collection update the collection history setting to true
      5. Create 5 million docs in each of the collection
      6. Upsert all the document thrice
      7. Total data on disk is close to 300GB(hence history starts getting cleared)
      8. Now, Perform cont. dedupe mutations(for 10000 docs)(300 iterations)
      9. Keep killing memcached (sleep between two memcached kills is 60 to 90 seconds, Before next sigkill test waits for cluster warmup to finish)
      10. while data loading and sigkills are going on keep deleting and recreating five collections (recreation of collection with same name) (sleep between two deletes is 60 to 90 second)
      11. Observed memcached crashed in DefragmentVisitor::visit(HashTable::HashBucketLock const&, StoredValue&)

      Below Core Dump was found on node 172.23.107.221 @
      8:22:26 PM and before core dump memcached was SIGKILLed on this node at
      8:19:52 PM


      [Thu Mar 2 20:22:26 2023] NonIoPool1[49857]: segfault at 121e7be04 ip 000000000080fae0 sp 00007f9295fe8ee0 error 4 in memcached[400000+a77000]

      BackTrace:

      (gdb) bt full
      #0  load (__m=std::memory_order_seq_cst, this=0x121e7be04)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/blob.h:63
      No locals.
      #1  operator std::__atomic_base<unsigned int>::__int_type (this=0x121e7be04)
          at /opt/gcc-10.2.0/include/c++/10.2.0/bits/atomic_base.h:289
      No locals.
      #2  valueSize (this=0x121e7be00) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/blob.h:63
      No locals.
      #3  valuelen (this=0x7f923dd781c0)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/stored-value.h:575
      No locals.
      #4  DefragmentVisitor::visit(HashTable::HashBucketLock const&, StoredValue&) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/defragmenter_visitor.cc:42
              value_len = <optimized out>
      #5  0x00000000006ee7cd in HashTable::pauseResumeVisit(HashTableVisitor&, HashTable::Position&) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/hash_table.cc:1288
              tmp = 0x48cdf9e8dd00
              lh = {bucketNum = 1077, htLock = {_M_device = 0x7f92b474feb8, _M_owns = true}}
              v = <optimized out>
              paused = false
              lh = {_M_device = 0x7f92b474f800, _M_owns = false}
              lock = 43
              hash_bucket = 1077
       
      #6  0x000000000072ee6b in PauseResumeVBAdapter::visit (this=0x7f92943aa9d0, vb=...)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/vb_visitors.cc:51
              ht_start = {ht_size = 0, lock = 0, hash_bucket = 0}
      #7  0x00000000006fe682 in KVBucket::pauseResumeVisit(PauseResumeVBVisitor&, KVBucketIface::Position&, VBucketFilter*) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kv_bucket.cc:2380
              paused = <optimized out>
              vb = {<std::__shared_ptr<VBucket, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<VBucket, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x7f92743cd700, _M_refcount = {
                    _M_pi = 0x7f92fc692c60}}, <No data fields>}
              vbid = {vbid = 2}
      #8  0x000000000080e795 in DefragmenterTask::defrag() ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_engine.h:661
      ---Type <return> to continue, or q <return> to quit---
              currentFragStats = {allocatedBytes = 685207056, residentBytes = 967065600}
              visitor = @0x7f92fc23b2e0: <error reading variable>
              start = {__d = {__r = 43046279973754380}}
              end = <optimized out>
              completed = <optimized out>
      #9  0x000000000080f8c8 in DefragmenterTask::run() ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/defragmenter.cc:56
              phosphor_internal_category_enabled_53 = {_M_b = {_M_p = 0x0}, static is_always_lock_free = <optimized out>}
              phosphor_internal_category_enabled_temp_53 = <optimized out>
              phosphor_internal_tpi_53 = {category = 0x0, name = 0x0, type = phosphor::AsyncStart, argument_names = {_M_elems = {
                    0x0, 0x0}}, argument_types = {_M_elems = {phosphor::is_bool, phosphor::is_bool}}}
              phosphor_internal_guard_53 = {tpi = 0x107a840 <DefragmenterTask::run()::phosphor_internal_tpi_53>, enabled = true,
                arg1 = {<No data fields>}, arg2 = {<No data fields>}, start = {__d = {__r = 43046279973612654}}}
              sleepTime = <optimized out>
      #10 0x0000000000ab63e9 in GlobalTask::execute(std::basic_string_view<char, std::char_traits<char> >) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/globaltask.cc:98
       
              guard = {previous = 0x0}
              start = <optimized out>
              runAgain = <optimized out>
      #11 0x0000000000aafaaa in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const (
          __closure=0x7f9295fe9650)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:309
              runAgain = <optimized out>
              proxy = @0x7f92fc45b230: <error reading variable>
      #12 0x0000000000ab779e in operator() (this=0x7f9295fe9650)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/cancellable_cpu_executor.cc:42
              fn = @0x7f9295fe9650: {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {
                  big = 0x7f92fc45b230, tiny = {
                    __data = "0\262E\374\222\177\000\000@\227\376\225\222\177", '\000' <repeats 11 times>, "\320M\374\222\177\000\000/\000\000\000\000\000\000\000\355a\300\000\000\000\000", __align = {<No data fields>}}},
                call_ = 0xaaffe0 <folly::detail::function::FunctionTraits<void ()>::callSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Data&)>,
                exec_ = 0xaae4d0 <folly::detail::function::execSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Op, folly::detail::function::Data*, folly::detail::function::Data)>}
      #13 CancellableCPUExecutor::add(GlobalTask*, folly::Function<void ()>)::{lambda()#1}::operator()() const ()
      ---Type <return> to continue, or q <return> to quit---
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/cancellable_cpu_executor.cc:42
              task = {storage_ = {{emptyState = -48 '\320', value = {task = 0x7f9294683dd0,
                      func = {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {big = 0x7f92fc45b230,
                          tiny = {
                            __data = "0\262E\374\222\177\000\000@\227\376\225\222\177", '\000' <repeats 11 times>, "\320M\374\222\177\000\000/\000\000\000\000\000\000\000\355a\300\000\000\000\000", __align = {<No data fields>}}},
                        call_ = 0xaaffe0 <folly::detail::function::FunctionTraits<void ()>::callSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Data&)>,
                        exec_ = 0xaae4d0 <folly::detail::function::execSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Op, folly::detail::function::Data*, folly::detail::function::Data)>}}}, hasValue = true}}
       
              this = <optimized out>
      #14 0x0000000000c157c0 in operator() (this=0x7f9295fe9840)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:416
              fn = @0x7f9295fe9840: {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {
                  big = 0x7f930814ac00, tiny = {
                    __data = "\000\254\024\b\223\177\000\000\320wG\f\223\177\000\000\060\000\000\000\000\000\000\000\301\302\000\000\000\000\000\000\020\000\000\000\000\000\000\000\020\231\376\225\222\177\000", __align = {<No data fields>}}},
                call_ = 0xab7b20 <folly::detail::function::FunctionTraits<void()>::callSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Data &)>,
                exec_ = 0xab70d0 <folly::detail::function::execSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Op, folly::detail::function::Data *, folly::detail::function::Data *)>}
       
      #15 folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) (this=this@entry=0x7f930814ad00, thread=...,
          task=task@entry=<unknown type in /usr/lib/debug/opt/couchbase/bin/memcached-7.2.0-5211.x86_64.debug, CU 0xa8cf48a, DIE 0xa9533bf>)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/ThreadPoolExecutor.cpp:97
              rctx = {
                prev_ = {<std::__shared_ptr<folly::RequestContext, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<folly::RequestContext, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x0, _M_refcount = {
                      _M_pi = 0x0}}, <No data fields>}}
              startTime = {__d = {__r = 43046279973607835}}
              stats = {expired = false, waitTime = {__r = 581136}, runTime = {__r = 0}, enqueueTime = {__d = {
                    __r = 43046279973026699}}, requestId = 0}
      ---Type <return> to continue, or q <return> to quit---
      #16 0x0000000000c0025a in folly::CPUThreadPoolExecutor::threadRun (this=0x7f930814ad00, thread=...)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/CPUThreadPoolExecutor.cpp:265
              task = {storage_ = {{emptyState = 0 '\000', value = {<folly::ThreadPoolExecutor::Task> = {
                        func_ = {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {
                            big = 0x7f930814ac00, tiny = {
                              __data = "\000\254\024\b\223\177\000\000\320wG\f\223\177\000\000\060\000\000\000\000\000\000\000\301\302\000\000\000\000\000\000\020\000\000\000\000\000\000\000\020\231\376\225\222\177\000", __align = {<No data fields>}}},
                          call_ = 0xab7b20 <folly::detail::function::FunctionTraits<void()>::callSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Data &)>,
                          exec_ = 0xab70d0 <folly::detail::function::execSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Op, folly::detail::function::Data *, folly::detail::function::Data *)>}, enqueueTime_ = {
                          __d = {__r = 43046279973026699}}, expiration_ = {__r = 0},
                        expireCallback_ = {<folly::detail::function::FunctionTraits<void()>> = {<No data fields>}, data_ = {
                            big = 0xc2c1, tiny = {
                              __data = "\301\302\000\000\000\000\000\000\273g\246", '\000' <repeats 13 times>, "_\276&\f\223\177\000\000p\312~\266\222\177\000\000@VG\f\223\177\000", __align = {<No data fields>}}}, call_ = 0x46752f
           <folly::detail::function::FunctionTraits<void ()>::uninitCall(folly::detail::function::Data&)>, exec_ = 0x0},
                        context_ = {<std::__shared_ptr<folly::RequestContext, (__gnu_cxx::_Lock_policy)2>> = {<std::__shared_ptr_access<folly::RequestContext, (__gnu_cxx::_Lock_policy)2, false, false>> = {<No data fields>}, _M_ptr = 0x0, _M_refcount = {
                              _M_pi = 0x0}}, <No data fields>}}, poison = false, priority_ = 0 '\000',
                      queueObserverPayload_ = 140269492333952}}, hasValue = true}}
              guard = {list_ = {forbid = true, prev = 0x0, curr = {name = {static npos = <optimized out>,
                      b_ = 0xcdaacb "CPUThreadPoolExecutor", e_ = 0xcdaae0 ""}}}}
      #17 0x0000000000c18779 in __invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__t=<optimized out>,
          __f=<optimized out>) at /usr/local/include/c++/7.3.0/bits/invoke.h:73
      No locals.
      #18 __invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__fn=<optimized out>)
          at /usr/local/include/c++/7.3.0/bits/invoke.h:95
      No locals.
      #19 __call<void, 0, 1> (__args=<optimized out>, this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:467
      No locals.
      #20 operator()<> (this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:551
      ---Type <return> to continue, or q <return> to quit---
      No locals.
      #21 folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:401
              fn = <optimized out>
      #22 0x0000000000aaf7a4 in operator() (this=0x7f93085f1fc0)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:49
              fn = @0x7f93085f1fc0: <error reading variable>
      #23 operator() (__closure=0x7f93085f1fc0)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:49
              threadNameOpt = {storage_ = {{emptyState = -96 '\240', value = {static npos = 18446744073709551615,
                      _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x7f9295fe99a0 "NonIoPool1"}, _M_string_length = 10, {_M_local_buf = "NonIoPool1\000\000\000\000\000",
                        _M_allocated_capacity = 8029725099528449870}}}, hasValue = true}}
              func = <error reading variable func (Cannot access memory at address 0x7f93085f1fc0)>
      #24 folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:401
              fn = @0x7f93085f1fc0: <error reading variable>
      #25 0x00007f930a032d40 in execute_native_thread_routine ()
          at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/thread.cc:80
      No locals.
      #26 0x00007f930be3aea5 in start_thread () from /lib64/libpthread.so.0
      No symbol table info available.
      #27 0x00007f930977bb0d in clone () from /lib64/libc.so.6
      No symbol table info available.
      

      QE-TEST:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/temp_vol.ini -p bucket_storage=magma,bucket_ram_quota=1024,init_loading=True,bucket_eviction_policy=fullEviction,rerun=False -t storage.magma.magma_crash_recovery.MagmaCrashTests.test_crash_during_dedupe,nodes_init=3,skip_cleanup=True,num_items=5000000,doc_size=1024,batch_size=100,sdk_timeout=60,log_level=info,infra_log_level=info,key_size=12,num_collections=15,ops_rate=20000,key_type=RandomKey,vbuckets=16,replicas=2,test_itr=3,bucket_history_retention_seconds=86400,bucket_history_retention_bytes=99636764160,standard_buckets=1,magma_buckets=1,num_scopes=1,autoCompactionDefined=true,meta_purge_interval=120,randomize_value=True,num_collections_to_drop=5 -m rest'
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ankush.sharma Ankush Sharma
              ankush.sharma Ankush Sharma
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty