Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60625

Memcached crash due to mem_used underflowing

    XMLWordPrintable

Details

    • Triaged
    • 0
    • Unknown

    Description

      I'm on cluster_run and encountering this crash on a simple data loading workload. This is happening on both a Magma and Couchstore bucket. I've attached the cbcollect with the Couchstore bucket repro.

      Note I encountered this during some fusion testing. But fusion code only makes changes to Magma, not to Couchstore. As I'm seeing this crash on a Couchstore bucket as well, I decided to log this issue.

      These are the relevant commits I'm on:

      root@ubu20-se39:~/rohan/kv_engine# git log -1
      WARNING: terminal is not fully functional
      commit 19335ef69c8a8a0dbc4c8d0eeb5b2bf21ceb48a7 (HEAD -> master, m/master, couchbase/master)
      Merge: f35208740 8f2389b3c
      Author: Gerrit Code Review <gerrit@4edf0475e841>
      Date:   Fri Jan 26 15:41:05 2024 +0000
       
       
          Merge "Merge commit trinity/67934e940 into master"
       
       
      root@ubu20-se39:~/rohan/couchstore# git log -1
      WARNING: terminal is not fully functional
      commit 86f3d74bd0017d48aedb77051ad386d9082dfeaf (HEAD, m/master, couchbase/trinity, couchbase/master)
      Author: Trond Norbye <trond.norbye@gmail.com>
      Date:   Tue Jan 2 08:22:33 2024 +0100
       
       
          MB-59041: Verify the existence rather than size

      To further make sure that no Magma changes are coming in the way, I checked out the master branch for Magma as well. I still see the issue.

      The crash (see frame 9) where bytesToEvict is too large:

      Core was generated by `/root/rohan/install/bin/memcached -C /data/rohan/cluster_run/data/n_0/config/me'.
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
      [Current thread is 22228 (LWP 3095395)]
      (gdb) bt
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
      #1  0x00007f4212c7e859 in __GI_abort () at abort.c:79
      #2  0x00007f42130588d1 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #3  0x00000000011c629e in backtrace_terminate_handler ()
          at /root/rohan/kv_engine/utilities/terminate_handler.cc:88
      #4  0x00007f421306437c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #5  0x00007f42130643e7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #6  0x00007f4213064699 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #7  0x00000000012fb396 in __cxxabiv1::__cxa_throw (thrownException=0x7f3e100019c0, 
          type=0x1aa77b8 <typeinfo for gsl::narrowing_error>, destructor=
          0x702bec <gsl::narrowing_error::~narrowing_error()>)
          at /home/couchbase/jenkins/cbdeps-ws/deps/packages/build/folly/folly.debug-prefix/src/folly.debug/folly/experimental/exception_tracer/ExceptionTracerLib.cpp:106
      #8  0x0000000000cd7382 in gsl::narrow<long, unsigned long> (u=18446744055833456664)
          at /root/rohan/third_party/gsl-lite/include/gsl/gsl-lite.hpp:2215
      #9  0x0000000000d26f22 in StrictQuotaItemPager::getEvictionRatios (this=0x7f41e436c910, 
          kvBuckets=std::vector of length 1, capacity 1 = {...}, bytesToEvict=18446744055833456664)
          at /root/rohan/kv_engine/engines/ep/src/item_pager.cc:111
      #10 0x0000000000d27b98 in StrictQuotaItemPager::schedulePagingVisitors (this=0x7f41e436c910, 
          bytesToEvict=18446744055833456664) at /root/rohan/kv_engine/engines/ep/src/item_pager.cc:291
      #11 0x0000000000d27377 in ItemPager::runPager (this=0x7f41e436c970, manuallyNotified=true)
          at /root/rohan/kv_engine/engines/ep/src/item_pager.cc:180
      #12 0x0000000000d499a9 in StrictQuotaItemPager::runInner (this=0x7f41e436c910, manuallyNotified=true)
          at /root/rohan/kv_engine/engines/ep/src/item_pager.h:146
      #13 0x0000000000da914a in EpNotifiableTask::run (this=0x7f41e436c910)
          at /root/rohan/kv_engine/engines/ep/src/ep_task.cc:56
      --Type <RET> for more, q to quit, c to continue without paging--                                                                  
      #14 0x00000000010b2981 in GlobalTask::execute (this=0x7f41e436c910, threadName="NonIoPool0")
          at /root/rohan/kv_engine/executor/globaltask.cc:79
      #15 0x0000000000da8fec in EpTask::execute (this=0x7f41e436c910, threadName="NonIoPool0")
          at /root/rohan/kv_engine/engines/ep/src/ep_task.cc:43
      #16 0x00000000010b8526 in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const (
          __closure=0x7f3f097ee4d0) at /root/rohan/kv_engine/executor/folly_executorpool.cc:163
      #17 0x00000000010c3ed5 in folly::detail::function::FunctionTraits<void ()>::callSmall<FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}>(folly::detail::function::Data&) (p=...)
          at /root/rohan/build/tlm/deps/folly.exploded/include/folly/Function.h:363
      #18 0x00000000010ba697 in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7f3f097ee4d0)
          at /root/rohan/build/tlm/deps/folly.exploded/include/folly/Function.h:392
      #19 0x00000000010a343e in operator() (__closure=0x7f3f097ee700) at /root/rohan/kv_engine/executor/cancellable_cpu_executor.cc:42
      #20 0x00000000010b4b61 in folly::detail::function::FunctionTraits<void()>::callSmall<CancellableCPUExecutor::add(GlobalTask*, folly::Func)::<lambda()> >(folly::detail::function::Data &) (p=...)
          at /root/rohan/build/tlm/deps/folly.exploded/include/folly/Function.h:363
      #21 0x00000000010ba697 in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7f3f097ee700)
          at /root/rohan/build/tlm/deps/folly.exploded/include/folly/Function.h:392
      #22 0x00000000012efcd4 in folly::ThreadPoolExecutor::runTask (this=0x7f420f4f9500, 
          thread=<error reading variable: Cannot access memory at address 0x7f40c290fbe8>, task=...)
          at /home/couchbase/jenkins/cbdeps-ws/deps/packages/build/folly/folly.debug-prefix/src/folly.debug/folly/executors/ThreadPoolExecutor.cpp:98
      #23 0x00000000012c5fb1 in folly::CPUThreadPoolExecutor::threadRun (this=0x7f420f4f9500, 
          thread=<error reading variable: Cannot access memory at address 0x7f40c290fbe8>)
          at /home/couchbase/jenkins/cbdeps-ws/deps/packages/build/folly/folly.debug-prefix/src/folly.debug/folly/executors/CPUThreadPoolExecutor.cpp:306
      #24 0x00000000012f8dde in std::__invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__f=<error reading variable>, 
          __t=<error reading variable>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/invoke.h:73
      #25 0x00000000012f82c9 in std::__invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__fn=<error reading variable>)
          at /opt/gcc-10.2.0/include/c++/10.2.0/bits/invoke.h:95
      #26 0x00000000012f7475 in std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (this=0x7f40c2998c40, __args=...) at /opt/gcc-10.2.0/include/c++/10.2.0/functional:416
      #27 0x00000000012f630e in std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)>::operator()<, void>() (this=0x7f40c2998c40) at /opt/gcc-10.2.0/include/c++/10.2.0/functional:499
      #28 0x00000000012f500e in folly::detail::function::FunctionTraits<void ()>::callSmall<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...) at /home/couchbase/jenkins/cbdeps-ws/deps/packages/build/folly/folly.debug-prefix/src/folly.debug/folly/Function.h:363
      #29 0x00000000010ba697 in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7f40c2998c40) at /root/rohan/build/tlm/deps/folly.exploded/include/folly/Function.h:392
      #30 0x00000000010b7cab in CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}::operator()() (__closure=0x7f40c2998c40) at /root/rohan/kv_engine/executor/folly_executorpool.cc:49
      #31 0x00000000010c3b7a in folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...) at /root/rohan/build/tlm/deps/folly.exploded/include/folly/Function.h:377
      #32 0x00000000010ba697 in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7f40c290fe70) at /root/rohan/build/tlm/deps/folly.exploded/include/folly/Function.h:392
      #33 0x00000000010b7a45 in folly::PriorityThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}::operator()() (__closure=0x7f40c290fe60) at /root/rohan/build/tlm/deps/folly.exploded/include/folly/executors/thread_factory/PriorityThreadFactory.h:52
      #34 0x00000000010c3a9d in folly::detail::function::FunctionTraits<void ()>::callBig<folly::PriorityThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...) at /root/rohan/build/tlm/deps/folly.exploded/include/folly/Function.h:377
      #35 0x00000000010ba697 in folly::detail::function::FunctionTraits<void ()>::operator()() (this=0x7f40c28c7590) at /root/rohan/build/tlm/deps/folly.exploded/include/folly/Function.h:392
      #36 0x00000000010b6711 in folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}::operator()() (__closure=0x7f40c28c7590) at /root/rohan/build/tlm/deps/folly.exploded/include/folly/executors/thread_factory/NamedThreadFactory.h:40
      #37 0x00000000010e9a44 in std::__invoke_impl<void, folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(std::__invoke_other, folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}&&) (__f=...) at /usr/include/c++/10/bits/invoke.h:60
      #38 0x00000000010e99ed in std::__invoke<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(std::__invoke_result&&, (folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}&&)...) (__fn=...) at /usr/include/c++/10/bits/invoke.h:95
      #39 0x00000000010e998e in std::thread::_Invoker<std::tuple<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0x7f40c28c7590) at /usr/include/c++/10/thread:264
      #40 0x00000000010e98aa in std::thread::_Invoker<std::tuple<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}> >::operator()() (this=0x7f40c28c7590) at /usr/include/c++/10/thread:271
      #41 0x00000000010e97b2 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<folly::NamedThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}> > >::_M_run() (this=0x7f40c28c7580) at /usr/include/c++/10/thread:215
      #42 0x00007f4213090df4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #43 0x00007f4213694609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #44 0x00007f4212d7b353 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
       
      
      

       

      Steps to repro:

      1. Start cluster_run as usual
      2. Load some data. In less than a minute, memcached will crash.

      I'm skeptical if the issue is something on my setup since this is too easy to reproduce and QE/kv team would've caught it in their testing. But would appreciate if someone from KV team can take a look as it is blocking some local cluster_run testing for me.

      I was able to repro this on an official build as well 2 days ago, but no longer able to. I tried 7.6.1-3062.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              rohan.suri Rohan Suri
              rohan.suri Rohan Suri
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty