Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47055

[Magma] - Memcached crashes seen on multi node rebalance in + CRUD on collections + magma

    XMLWordPrintable

Details

    • Triaged
    • Centos 64-bit
    • 1
    • No
    • KV-Engine Sprint 2021 July

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops.ini rerun=False,get-cbcollect-info=False,quota_percent=99,crash_warning=True,bucket_storage=magma,enable_dp=True -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_in,nodes_init=3,nodes_in=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=before,scrape_interval=5,rebalance_moves_per_node=32,quota_percent=80,skip_validations=False,GROUP=rebalance_with_collection_crud'
      

      Steps to Repro
      1. Create a 3 node cluster
      2021-06-22 01:21:10,908 | test | INFO | pool-3-thread-6 | [table_view:display:72] Rebalance Overview
      ----------------------------------------------------------------------

      Nodes Services Version CPU Status

      ----------------------------------------------------------------------

      172.23.98.196 kv 7.1.0-1031-enterprise 0.956696878147 Cluster node
      172.23.98.195 None     <--- IN —
      172.23.121.10 None     <--- IN —

      ----------------------------------------------------------------------

      2. Create bucket/scopes/collections/data
      2021-06-22 01:27:12,309 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      --------------------------------------------------------------------------------------------------------------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      --------------------------------------------------------------------------------------------------------------

      Ct_pp3_ykntUWDWtQHVXlQ3gJIHGGDp-21-615000 couchbase 2 none 0 3000000 6291456000 2852133136 8603708056

      --------------------------------------------------------------------------------------------------------------

      3. Add 2 nodes and rebalance in
      2021-06-22 01:27:39,125 | test | INFO | pool-3-thread-26 | [table_view:display:72] Rebalance Overview
      ----------------------------------------------------------------------

      Nodes Services Version CPU Status

      ----------------------------------------------------------------------

      172.23.98.196 kv 7.1.0-1031-enterprise 49.2283950617 Cluster node
      172.23.98.195 kv 7.1.0-1031-enterprise 37.1029224905 Cluster node
      172.23.121.10 kv 7.1.0-1031-enterprise 33.3333333333 Cluster node
      172.23.104.186 None     <--- IN —
      172.23.120.201 None     <--- IN —

      ----------------------------------------------------------------------

      At this point crash 47c5739a-5789-4fde-67f412ad-78a23e84.dmp is seen on 172.23.104.186 .
      grep CRITICAL on memcached

      2021-06-22T01:42:55.140119-07:00 CRITICAL *** Fatal error encountered during exception handling ***
      2021-06-22T01:42:55.141082-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): ThrowExceptionUnderflowPolicy current:2 arg:-4
      2021-06-22T01:42:55.141102-07:00 CRITICAL Exception thrown from:
      2021-06-22T01:42:55.141176-07:00 CRITICAL     #0  /opt/couchbase/bin/memcached() [0x400000+0x2de933]
      2021-06-22T01:42:55.141220-07:00 CRITICAL     #1  /opt/couchbase/bin/memcached() [0x400000+0xd2774]
      2021-06-22T01:42:55.141287-07:00 CRITICAL     #2  /opt/couchbase/bin/memcached() [0x400000+0x4528ff]
      2021-06-22T01:42:55.141331-07:00 CRITICAL     #3  /opt/couchbase/bin/memcached() [0x400000+0x452c1e]
      2021-06-22T01:42:55.141360-07:00 CRITICAL     #4  /opt/couchbase/bin/memcached() [0x400000+0x452eb9]
      2021-06-22T01:42:55.141403-07:00 CRITICAL     #5  /opt/couchbase/bin/memcached() [0x400000+0x4fc8ed]
      2021-06-22T01:42:55.141454-07:00 CRITICAL     #6  /opt/couchbase/bin/memcached(_ZN5magma5Magma4Impl9WriteDocsEtRKSt6vectorINS0_14WriteOperationESaIS3_EEjSt8functionIFvRKS3_bNS_5SliceEEES8_IFNS_6StatusERS5_EE+0x275) [0x400000+0x4decf5]
      2021-06-22T01:42:55.141498-07:00 CRITICAL     #7  /opt/couchbase/bin/memcached(_ZN5magma5Magma9WriteDocsEtRKSt6vectorINS0_14WriteOperationESaIS2_EEjSt8functionIFvRKS2_bNS_5SliceEEES7_IFNS_6StatusERS4_EE+0xb2) [0x400000+0x4deea2]
      2021-06-22T01:42:55.141538-07:00 CRITICAL     #8  /opt/couchbase/bin/memcached() [0x400000+0x44c2a2]
      2021-06-22T01:42:55.141566-07:00 CRITICAL     #9  /opt/couchbase/bin/memcached() [0x400000+0x449131]
      2021-06-22T01:42:55.141594-07:00 CRITICAL     #10 /opt/couchbase/bin/memcached() [0x400000+0x3e474e]
      2021-06-22T01:42:55.141629-07:00 CRITICAL     #11 /opt/couchbase/bin/memcached() [0x400000+0x3eb58d]
      2021-06-22T01:42:55.141654-07:00 CRITICAL     #12 /opt/couchbase/bin/memcached() [0x400000+0x3eb94f]
      2021-06-22T01:42:55.141697-07:00 CRITICAL     #13 /opt/couchbase/bin/memcached() [0x400000+0x2bae22]
      2021-06-22T01:42:55.141724-07:00 CRITICAL     #14 /opt/couchbase/bin/memcached() [0x400000+0x2bb2d0]
      2021-06-22T01:42:55.141774-07:00 CRITICAL     #15 /opt/couchbase/bin/memcached() [0x400000+0x5fb6a2]
      2021-06-22T01:42:55.141822-07:00 CRITICAL     #16 /opt/couchbase/bin/memcached() [0x400000+0x5f87c5]
      2021-06-22T01:42:55.141924-07:00 CRITICAL     #17 /opt/couchbase/bin/memcached() [0x400000+0x7537a0]
      2021-06-22T01:42:55.141971-07:00 CRITICAL     #18 /opt/couchbase/bin/memcached() [0x400000+0x73b55a]
      2021-06-22T01:42:55.142011-07:00 CRITICAL     #19 /opt/couchbase/bin/memcached() [0x400000+0x756759]
      2021-06-22T01:42:55.142057-07:00 CRITICAL     #20 /opt/couchbase/bin/memcached() [0x400000+0x5f8344]
      2021-06-22T01:42:55.142148-07:00 CRITICAL     #21 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fa7fb883000+0xcdd40]
      2021-06-22T01:42:55.142165-07:00 CRITICAL     #22 /lib64/libpthread.so.0() [0x7fa7fd765000+0x7ea5]
      2021-06-22T01:42:55.142620-07:00 CRITICAL     #23 /lib64/libc.so.6(clone+0x6d) [0x7fa7faf9b000+0xfe9fd]
      

      bt full of 47c5739a-5789-4fde-67f412ad-78a23e84.dmp on 172.23.104.186

      (gdb) bt
      #0  0x00007fa7fafd13d7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
      #1  0x00007fa7fafd2ac8 in __GI_abort () at abort.c:90
      #2  0x00007fa7fb91c63c in __gnu_cxx::__verbose_terminate_handler () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/vterminate.cc:95
      #3  0x0000000000a92ceb in backtrace_terminate_handler() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/utilities/terminate_handler.cc:88
      #4  0x00007fa7fb9278f6 in __cxxabiv1::__terminate(void (*)()) () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:48
      #5  0x00007fa7fb927961 in std::terminate () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_terminate.cc:58
      #6  0x00007fa7fb927bf4 in __cxxabiv1::__cxa_throw (obj=obj@entry=0x7fa7c4000940, tinfo=tinfo@entry=0xf96fc0 <typeinfo for boost::exception_detail::error_info_injector<std::underflow_error>>, 
          dest=dest@entry=0x6db060 <boost::exception_detail::error_info_injector<std::underflow_error>::~error_info_injector()>) at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/libsupc++/eh_throw.cc:95
      #7  0x00000000006dea47 in cb::throwWithTrace<std::underflow_error> (exception=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/boost.exploded/include/boost/exception/info.hpp:129
      #8  0x00000000004d2774 in cb::ThrowExceptionUnderflowPolicy<unsigned long>::underflow (current=140358717451952, arg=<optimized out>, desired=<optimized out>, this=0x7fa7cf7eafa8)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/magma-kvstore/magma-kvstore.cc:1901
      #9  0x00000000008528ff in fetch_add (arg=-4, this=0x7fa7cf7eafa8) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/char_traits.h:395
      #10 operator+= (rhs=18446744073709551612, this=0x7fa7cf7eafa8) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/include/platform/non_negative_counter.h:174
      #11 MagmaKVStore::updateDroppedCollections(Vbid, std::vector<MagmaKVStore::MagmaLocalReq, std::allocator<MagmaKVStore::MagmaLocalReq> >&, Collections::VB::Flush&) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/magma-kvstore/magma-kvstore.cc:2450
      #12 0x0000000000852c1e in MagmaKVStore::updateCollectionsMeta(Vbid, std::vector<MagmaKVStore::MagmaLocalReq, std::allocator<MagmaKVStore::MagmaLocalReq> >&, Collections::VB::Flush&) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/magma-kvstore/magma-kvstore.cc:2375
      #13 0x0000000000852eb9 in operator() (postWriteOps=..., __closure=0x7fa78c56a5c0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/magma-kvstore/magma-kvstore.cc:1228
      #14 __invoke_impl<magma::Status, MagmaKVStore::saveDocs(VB::Commit&, kvstats_ctx&)::<lambda(MagmaKVStore::WriteOps&)>&, std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> >&> (__f=...)
          at /opt/gcc-10.2.0/include/c++/10.2.0/bits/invoke.h:60
      #15 __invoke_r<magma::Status, MagmaKVStore::saveDocs(VB::Commit&, kvstats_ctx&)::<lambda(MagmaKVStore::WriteOps&)>&, std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> >&> (__fn=...)
          at /opt/gcc-10.2.0/include/c++/10.2.0/bits/invoke.h:115
      #16 std::_Function_handler<magma::Status (std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> >&), MagmaKVStore::saveDocs(VB::Commit&, kvstats_ctx&)::{lambda(std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> >&)#2}>::_M_invoke(std::_Any_data const&, std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> >&) ()
          at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:292
      #17 0x00000000008fc8ed in operator() (__args#0=..., this=0x7fa7cf7eb590) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
      #18 magma::KVStore::WriteDocs(magma::WAL*, std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> > const&, std::function<void (magma::Magma::WriteOperation const&, bool, magma::Slice)>, std::function<magma::Status (std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> >&)>) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/write.cc:104
      #19 0x00000000008decf5 in magma::Magma::Impl::WriteDocs(unsigned short, std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> > const&, unsigned int, std::function<void (magma::Magma::WriteOperation const&, bool, magma::Slice)>, std::function<magma::Status (std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> >&)>) () at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
      #20 0x00000000008deea2 in magma::Magma::WriteDocs(unsigned short, std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> > const&, unsigned int, std::function<void (magma::Magma::WriteOperation const&, bool, magma::Slice)>, std::function<magma::Status (std::vector<magma::Magma::WriteOperation, std::allocator<magma::Magma::WriteOperation> >&)>) () at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
      #21 0x000000000084c2a2 in MagmaKVStore::saveDocs(VB::Commit&, kvstats_ctx&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/include/memcached/vbucket.h:62
      #22 0x0000000000849131 in commit (commitData=..., this=0x7fa7f4293f00) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/magma-kvstore/magma-kvstore.cc:611
      #23 MagmaKVStore::commit (this=0x7fa7f4293f00, commitData=...) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/magma-kvstore/magma-kvstore.cc:594
      #24 0x00000000007e474e in EPBucket::commit(Vbid, KVStore&, VB::Commit&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:897
      #25 0x00000000007eb58d in EPBucket::flushVBucket_UNLOCKED(LockedVBucketPtr) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:772
      #26 0x00000000007eb94f in EPBucket::flushVBucket(Vbid) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:369
      #27 0x00000000006bae22 in Flusher::flushVB() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/flusher.cc:265
      #28 0x00000000006bb2d0 in Flusher::step(GlobalTask*) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/flusher.cc:199
      #29 0x00000000009fb6a2 in GlobalTask::execute() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/globaltask.cc:68
      #30 0x00000000009f87c5 in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const (__closure=0x7fa7cf7ec540)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:189
      #31 0x0000000000b537a0 in operator() (this=0x7fa7cf7ec540) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:416
      #32 folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) (this=0x7fa7f9d4c000, thread=..., 
          task=<unknown type in /usr/lib/debug/opt/couchbase/bin/memcached.debug, CU 0x6442ea8, DIE 0x6488755>)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/ThreadPoolExecutor.cpp:97
      #33 0x0000000000b3b55a in folly::CPUThreadPoolExecutor::threadRun (this=0x7fa7f9d4c000, thread=...)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/CPUThreadPoolExecutor.cpp:265
      #34 0x0000000000b56759 in __invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (
          __t=<optimized out>, __f=<optimized out>) at /usr/local/include/c++/7.3.0/bits/invoke.h:73
      #35 __invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__fn=<optimized out>)
          at /usr/local/include/c++/7.3.0/bits/invoke.h:95
      #36 __call<void, 0, 1> (__args=<optimized out>, this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:467
      #37 operator()<> (this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:551
      #38 folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:401
      #39 0x00000000009f8344 in operator() (this=0x7fa7f9ccbcc0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:47
      #40 operator() (__closure=0x7fa7f9ccbcc0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:47
      #41 folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:401
      ---Type <return> to continue, or q <return> to quit---
      #42 0x00007fa7fb950d40 in execute_native_thread_routine () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/thread.cc:80
      #43 0x00007fa7fd76cea5 in start_thread (arg=0x7fa7cf7fe700) at pthread_create.c:307
      #44 0x00007fa7fb0999fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      (gdb) 
      

      cbcollect_info attached.

      Attachments

        1. thread_apply_all_bt.txt
          120 kB
        2. mb-47055btfull.txt
          29 kB
        3. info_threads.txt
          6 kB
        4. bt_full.txt
          47 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty