Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48707

[Magma] - Minidumps seen during graceful failover + rebalance out + CRUD on collections

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • No

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.40477.ini GROUP=failover_with_rebalance_out_P0_set0,rerun=False,upgrade_version=7.1.0-1386 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_graceful_failover_rebalance_out,nodes_init=5,nodes_failover=1,override_spec_params=durability,durability=MAJORITY,bucket_spec=magma_dgm.10_percent_dgm.5_node_2_replica_magma_256,doc_size=256,randomize_value=True,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,skip_validations=False,GROUP=failover_with_rebalance_out_P0_set0'
      

      Steps to Repro
      1. Create a 5 node cluster
      2021-09-30 13:07:44,331 | test | INFO | pool-6-thread-6 | [table_view:display:72] Rebalance Overview

      +----------------+----------+-----------------------+---------------+--------------+
      | Nodes          | Services | Version               | CPU           | Status       |
      +----------------+----------+-----------------------+---------------+--------------+
      | 172.23.104.76  | kv       | 7.1.0-1386-enterprise | 2.25563909774 | Cluster node |
      | 172.23.105.112 | None     |                       |               | <--- IN ---  |
      | 172.23.105.118 | None     |                       |               | <--- IN ---  |
      | 172.23.105.109 | None     |                       |               | <--- IN ---  |
      | 172.23.105.105 | None     |                       |               | <--- IN ---  |
      +----------------+----------+-----------------------+---------------+--------------+
      

      2. Create buckets/scopes/collections/data

      +---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
      | Bucket  | Type      | Storage Backend | Replicas | Durability | TTL | Items    | RAM Quota | RAM Used   | Disk Used  | ARR           |
      +---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
      | bucket1 | couchbase | couchstore      | 2        | none       | 0   | 100000   | 9.77 GiB  | 294.25 MiB | 342.80 MiB | 100           |
      | bucket2 | couchbase | magma           | 2        | none       | 0   | 50000    | 4.88 GiB  | 514.51 MiB | 544.71 MiB | 100           |
      | default | couchbase | magma           | 2        | none       | 0   | 16287500 | 1.25 GiB  | 890.78 MiB | 11.44 GiB  | 2.47110360706 |
      +---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
      

      3. Graceful failover a node (172.23.105.105)

      2021-09-30 14:06:33,346 | test  | WARNING | MainThread | [rest_client:get_nodes:1846] 172.23.105.105 - Node not part of cluster inactiveFailed
      

      4. Do a rebalance out of the node
      2021-09-30 14:06:40,197 | test | INFO | pool-6-thread-30 | [table_view:display:72] Rebalance Overview

      +----------------+----------+-----------------------+----------------+--------------+
      | Nodes          | Services | Version               | CPU            | Status       |
      +----------------+----------+-----------------------+----------------+--------------+
      | 172.23.105.118 | kv       | 7.1.0-1386-enterprise | 29.4042607475  | Cluster node |
      | 172.23.105.105 | kv       | 7.1.0-1386-enterprise | 0.526711813394 | --- OUT ---> |
      | 172.23.105.112 | kv       | 7.1.0-1386-enterprise | 25.1047087194  | Cluster node |
      | 172.23.105.109 | kv       | 7.1.0-1386-enterprise | 26.2744103475  | Cluster node |
      | 172.23.104.76  | kv       | 7.1.0-1386-enterprise | 29.2431192661  | Cluster node |
      +----------------+----------+-----------------------+----------------+--------------+
      

      Rebalance fails and we see a minidump 08f3ed68-8449-41fa-bb8b799f-87c1e633.dmp on 172.23.105.118..

      grep CRITICAL on 172.23.105.118

      Balakumarans-MacBook-Pro-2:cbcollect_info_ns_1@172.23.105.118_20210930-211324 balakumaran.g$ grep CRITICAL memcached.log 
      2021-09-30T14:12:17.268313-07:00 CRITICAL Detected previous crash
      2021-09-30T14:12:17.268388-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.1.0-1386). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/08f3ed68-8449-41fa-bb8b799f-87c1e633.dmp before terminating.
      2021-09-30T14:12:17.268399-07:00 CRITICAL Stack backtrace of crashed thread:
      2021-09-30T14:12:17.268402-07:00 CRITICAL    #0  /opt/couchbase/bin/memcached() [0x400000+0x69d858]
      2021-09-30T14:12:17.268404-07:00 CRITICAL    #1  /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x6ed8ca]
      2021-09-30T14:12:17.268406-07:00 CRITICAL    #2  /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x6edc08]
      2021-09-30T14:12:17.268407-07:00 CRITICAL    #3  /lib64/libpthread.so.0() [0x7fb0486e1000+0xf630]
      2021-09-30T14:12:17.268409-07:00 CRITICAL    #4  /opt/couchbase/bin/memcached() [0x400000+0x538600]
      2021-09-30T14:12:17.268410-07:00 CRITICAL    #5  /opt/couchbase/bin/memcached() [0x400000+0x52f6af]
      2021-09-30T14:12:17.268434-07:00 CRITICAL    #6  /opt/couchbase/bin/memcached() [0x400000+0x5308c4]
      2021-09-30T14:12:17.268435-07:00 CRITICAL    #7  /opt/couchbase/bin/memcached() [0x400000+0x531236]
      2021-09-30T14:12:17.268437-07:00 CRITICAL    #8  /opt/couchbase/bin/memcached() [0x400000+0x531a0d]
      2021-09-30T14:12:17.268466-07:00 CRITICAL    #9  /opt/couchbase/bin/memcached() [0x400000+0x531f48]
      2021-09-30T14:12:17.268491-07:00 CRITICAL    #10 /opt/couchbase/bin/memcached(_ZN5magma5Magma4Impl11syncKVStoreEtb+0x215) [0x400000+0x4dd5c5]
      2021-09-30T14:12:17.268494-07:00 CRITICAL    #11 /opt/couchbase/bin/memcached() [0x400000+0x4dd740]
      2021-09-30T14:12:17.268495-07:00 CRITICAL    #12 /opt/couchbase/bin/memcached() [0x400000+0x4e40ae]
      2021-09-30T14:12:17.268507-07:00 CRITICAL    #13 /opt/couchbase/bin/memcached(_ZN5magma5Magma4Impl14CompactKVStoreEtRKNS_5SliceES4_St8functionIFSt10unique_ptrINS0_18CompactionCallbackESt14default_deleteIS7_EEtEE+0x29e) [0x400000+0x4de00e]
      2021-09-30T14:12:17.268510-07:00 CRITICAL    #14 /opt/couchbase/bin/memcached(_ZN5magma5Magma14CompactKVStoreEtRKNS_5SliceES3_St8functionIFSt10unique_ptrINS0_18CompactionCallbackESt14default_deleteIS6_EEtEE+0x6d) [0x400000+0x4de0dd]
      2021-09-30T14:12:17.268513-07:00 CRITICAL    #15 /opt/couchbase/bin/memcached() [0x400000+0x45629e]
      2021-09-30T14:12:17.268515-07:00 CRITICAL    #16 /opt/couchbase/bin/memcached() [0x400000+0x44c5ea]
      2021-09-30T14:12:17.268539-07:00 CRITICAL    #17 /opt/couchbase/bin/memcached() [0x400000+0x44da06]
      2021-09-30T14:12:17.268541-07:00 CRITICAL    #18 /opt/couchbase/bin/memcached() [0x400000+0x3e9ada]
      2021-09-30T14:12:17.268542-07:00 CRITICAL    #19 /opt/couchbase/bin/memcached() [0x400000+0x3eb0f1]
      2021-09-30T14:12:17.268544-07:00 CRITICAL    #20 /opt/couchbase/bin/memcached() [0x400000+0x3089c6]
      2021-09-30T14:12:17.268547-07:00 CRITICAL    #21 /opt/couchbase/bin/memcached() [0x400000+0x61cf32]
      2021-09-30T14:12:17.268548-07:00 CRITICAL    #22 /opt/couchbase/bin/memcached() [0x400000+0x61a055]
      2021-09-30T14:12:17.268549-07:00 CRITICAL    #23 /opt/couchbase/bin/memcached() [0x400000+0x7649a0]
      2021-09-30T14:12:17.268594-07:00 CRITICAL    #24 /opt/couchbase/bin/memcached() [0x400000+0x74c75a]
      2021-09-30T14:12:17.268632-07:00 CRITICAL    #25 /opt/couchbase/bin/memcached() [0x400000+0x767959]
      2021-09-30T14:12:17.268634-07:00 CRITICAL    #26 /opt/couchbase/bin/memcached() [0x400000+0x619ce4]
      2021-09-30T14:12:17.268652-07:00 CRITICAL    #27 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fb0467f7000+0xcdd40]
      2021-09-30T14:12:17.268654-07:00 CRITICAL    #28 /lib64/libpthread.so.0() [0x7fb0486e1000+0x7ea5]
      2021-09-30T14:12:17.268655-07:00 CRITICAL    #29 /lib64/libc.so.6(clone+0x6d) [0x7fb045f0f000+0xfe8dd]
      Balakumarans-MacBook-Pro-2:cbcollect_info_ns_1@172.23.105.118_20210930-211324 balakumaran.g$ 
      

      bt of 08f3ed68-8449-41fa-bb8b799f-87c1e633.dmp on 172.23.105.118.

      (gdb) 
      #0  __atomic_add (__val=1, __mem=0x20) at /opt/gcc-10.2.0/include/c++/10.2.0/ext/atomicity.h:96
      #1  __atomic_add_dispatch (__val=1, __mem=0x20) at /opt/gcc-10.2.0/include/c++/10.2.0/ext/atomicity.h:96
      #2  _M_add_ref_copy (this=0x18) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:142
      #3  __shared_count (__r=..., this=0x7fb00bfec708) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:740
      #4  __shared_ptr (this=0x7fb00bfec700) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:1181
      #5  shared_ptr (this=0x7fb00bfec700) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr.h:149
      #6  Checkpoint (this=0x7fb00bfec6f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/lsm/checkpoint.h:46
      #7  construct<magma::KVStoreCheckpoint, magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (this=<optimized out>, __p=0x7faf9e9511c0) at /opt/gcc-10.2.0/include/c++/10.2.0/ext/new_allocator.h:150
      #8  construct<magma::KVStoreCheckpoint, magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (__p=0x7faf9e9511c0, __a=...) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/alloc_traits.h:512
      #9  std::vector<magma::KVStoreCheckpoint, std::allocator<magma::KVStoreCheckpoint> >::_M_realloc_insert<magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (this=this@entry=0x7fb00bfec8b0, __position=__position@entry=...)
          at /opt/gcc-10.2.0/include/c++/10.2.0/bits/vector.tcc:449
      #10 0x000000000092f6af in emplace_back<magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (this=0x7fb00bfec8b0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:1424
      #11 magma::KVStore::getAllCheckpoints(std::vector<magma::KVStoreCheckpoint, std::allocator<magma::KVStoreCheckpoint> >&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:1424
      #12 0x00000000009308c4 in magma::KVStore::verifyCheckpoints() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:1469
      #13 0x0000000000931236 in magma::KVStore::flushMemTables(magma::WAL*, magma::WALOffset, magma::FlushMode, magma::BlockingMode)::{lambda()#2}::operator()() ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:503
      #14 0x0000000000931a0d in magma::KVStore::flushMemTables(magma::WAL*, magma::WALOffset, magma::FlushMode, magma::BlockingMode) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:514
      #15 0x0000000000931f48 in magma::KVStore::FlushMemTables (this=<optimized out>, wal=<optimized out>, flushMode=<optimized out>, blockMode=<optimized out>)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:336
      #16 0x00000000008dd5c5 in magma::Magma::Impl::syncKVStore(unsigned short, bool) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:1133
      #17 0x00000000008dd740 in std::_Function_handler<void (), magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::{lambda()#2}>::_M_invoke(std::_Any_data const&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:784
      #18 0x00000000008e40ae in magma::defer::~defer (this=0x7fb00bfecd50, __in_chrg=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
      #19 0x00000000008de00e in magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:798
      #20 0x00000000008de0dd in magma::Magma::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) () at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
      #21 0x000000000085629e in MagmaMemoryTrackingProxy::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) () at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
      #22 0x000000000084c5ea in MagmaKVStore::compactDBInternal(std::unique_lock<std::mutex>&, std::shared_ptr<CompactionContext>) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/include/memcached/vbucket.h:62
      #23 0x000000000084da06 in MagmaKVStore::compactDB(std::unique_lock<std::mutex>&, std::shared_ptr<CompactionContext>) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kvstore/magma-kvstore/magma-kvstore.cc:2032
      #24 0x00000000007e9ada in EPBucket::compactInternal(LockedVBucketPtr&, CompactionConfig&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/vbucket.h:2599
      #25 0x00000000007eb0f1 in EPBucket::doCompact(Vbid, CompactionConfig&, std::vector<CookieIface const*, std::allocator<CookieIface const*> >&) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:1359
      #26 0x00000000007089c6 in CompactTask::run() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/tasks.cc:73
      #27 0x0000000000a1cf32 in GlobalTask::execute() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/globaltask.cc:68
      #28 0x0000000000a1a055 in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const (__closure=0x7fb00bfed840)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:189
      #29 0x0000000000b649a0 in operator() (this=0x7fb00bfed840) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:416
      #30 folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) (this=0x7fb044899400, thread=..., 
          task=<unknown type in /usr/lib/debug/opt/couchbase/bin/memcached.debug, CU 0x65a24fb, DIE 0x65e7bd8>)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/ThreadPoolExecutor.cpp:97
      #31 0x0000000000b4c75a in folly::CPUThreadPoolExecutor::threadRun (this=0x7fb044899400, thread=...)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/CPUThreadPoolExecutor.cpp:265
      #32 0x0000000000b67959 in __invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (
          __t=<optimized out>, __f=<optimized out>) at /usr/local/include/c++/7.3.0/bits/invoke.h:73
      #33 __invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__fn=<optimized out>)
          at /usr/local/include/c++/7.3.0/bits/invoke.h:95
      #34 __call<void, 0, 1> (__args=<optimized out>, this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:467
      #35 operator()<> (this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:551
      #36 folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:401
      #37 0x0000000000a19ce4 in operator() (this=0x7fb044de1280) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:47
      #38 operator() (__closure=0x7fb044de1280) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:47
      #39 folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:401
      #40 0x00007fb0468c4d40 in execute_native_thread_routine () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/thread.cc:80
      #41 0x00007fb0486e8ea5 in start_thread (arg=0x7fb00bfff700) at pthread_create.c:307
      #42 0x00007fb04600d8dd in ioperm () at ../sysdeps/unix/syscall-template.S:81
      #43 0x0000000000000000 in ?? ()
      

      This is a new test being run. So, we don't have a baseline for this test yet.

      cbcollect_info attached.

      Attachments

        1. bt_full.txt
          37 kB
        2. info_threads.txt
          9 kB
        3. test.log
          34 kB
        4. thread_apply_all_bt.txt
          143 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              apaar.gupta Apaar Gupta
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty