Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48842

[Magma] - Minidumps seen during multi node swap rebalance

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • No

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.49251.ini GROUP=swap_rebalance_P0_set0,rerun=False,disk_optimized_thread_settings=True,get-cbcollect-info=True,upgrade_version=7.1.0-1458 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_swap_rebalance,nodes_init=5,nodes_swap=2,compaction=True,bucket_spec=magma_dgm.5_percent_dgm.5_node_2_replica_magma_512,doc_size=512,randomize_value=True,data_load_stage=during,skip_validations=False,GROUP=swap_rebalance_P0_set0'
      

      Steps to Repro
      1. Create a 5 node cluster.

      2021-10-10 01:13:26,608 | test  | INFO    | pool-5-thread-8 | [table_view:display:72] Rebalance Overview
      +----------------+----------+-----------------------+---------------+--------------+
      | Nodes          | Services | Version               | CPU           | Status       |
      +----------------+----------+-----------------------+---------------+--------------+
      | 172.23.106.130 | kv       | 7.1.0-1458-enterprise | 2.21216691805 | Cluster node |
      | 172.23.104.232 | None     |                       |               | <--- IN ---  |
      | 172.23.104.252 | None     |                       |               | <--- IN ---  |
      | 172.23.104.76  | None     |                       |               | <--- IN ---  |
      | 172.23.104.216 | None     |                       |               | <--- IN ---  |
      +----------------+----------+-----------------------+---------------+--------------+
      

      2. Create buckets/scopes/collections/data

      2021-10-10 01:40:06,456 | test  | INFO    | MainThread | [table_view:display:72] Bucket statistics
      +---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
      | Bucket  | Type      | Storage Backend | Replicas | Durability | TTL | Items    | RAM Quota | RAM Used   | Disk Used  | ARR           |
      +---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
      | bucket1 | couchbase | couchstore      | 2        | none       | 0   | 50000    | 9.77 GiB  | 250.87 MiB | 319.09 MiB | 100           |
      | bucket2 | couchbase | magma           | 2        | none       | 0   | 50000    | 4.88 GiB  | 525.89 MiB | 557.91 MiB | 100           |
      | default | couchbase | magma           | 2        | none       | 0   | 32575000 | 2.50 GiB  | 1.77 GiB   | 39.71 GiB  | 3.42467843438 |
      +---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
      

      3. Add 2 nodes(172.23.106.129 and 172.23.104.15) , Remove 2 nodes(172.23.104.216 and 172.23.104.76) and start a swap rebalance

      2021-10-10 01:40:19,831 | test  | INFO    | pool-5-thread-6 | [table_view:display:72] Rebalance Overview
      +----------------+----------+-----------------------+---------------+--------------+
      | Nodes          | Services | Version               | CPU           | Status       |
      +----------------+----------+-----------------------+---------------+--------------+
      | 172.23.104.15  | kv       | 7.1.0-1458-enterprise | 0             | Cluster node |
      | 172.23.104.232 | kv       | 7.1.0-1458-enterprise | 8.14217292664 | Cluster node |
      | 172.23.106.129 | kv       | 7.1.0-1458-enterprise | 0             | Cluster node |
      | 172.23.104.216 | kv       | 7.1.0-1458-enterprise | 9.16887375457 | --- OUT ---> |
      | 172.23.104.252 | kv       | 7.1.0-1458-enterprise | 8.40590685346 | Cluster node |
      | 172.23.106.130 | kv       | 7.1.0-1458-enterprise | 10.6821921276 | Cluster node |
      | 172.23.104.76  | kv       | 7.1.0-1458-enterprise | 8.25444907232 | --- OUT ---> |
      +----------------+----------+-----------------------+---------------+--------------+
      

      At this point we see 9e3e32c-fd4c-4398-418dc5bf-529aecdc.dmp on 172.23.104.232

      grep CRITICAL on 172.23.104.232

      Balakumarans-MacBook-Pro-2:cbcollect_info_ns_1@172.23.104.232_20211010-085015 balakumaran.g$ grep CRITICAL memcached.log 
      2021-10-10T01:49:05.460178-07:00 CRITICAL Detected previous crash
      2021-10-10T01:49:05.460225-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.1.0-1458). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/29e3e32c-fd4c-4398-418dc5bf-529aecdc.dmp before terminating.
      2021-10-10T01:49:05.460236-07:00 CRITICAL Stack backtrace of crashed thread:
      2021-10-10T01:49:05.460237-07:00 CRITICAL    #0  /opt/couchbase/bin/memcached() [0x400000+0x6d5fc8]
      2021-10-10T01:49:05.460239-07:00 CRITICAL    #1  /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x7260da]
      2021-10-10T01:49:05.460241-07:00 CRITICAL    #2  /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x726418]
      2021-10-10T01:49:05.460242-07:00 CRITICAL    #3  /lib64/libpthread.so.0() [0x7fb3560a8000+0xf630]
      2021-10-10T01:49:05.460244-07:00 CRITICAL    #4  /opt/couchbase/bin/memcached() [0x400000+0x56f450]
      2021-10-10T01:49:05.460245-07:00 CRITICAL    #5  /opt/couchbase/bin/memcached() [0x400000+0x5663cf]
      2021-10-10T01:49:05.460257-07:00 CRITICAL    #6  /opt/couchbase/bin/memcached() [0x400000+0x5675e4]
      2021-10-10T01:49:05.460259-07:00 CRITICAL    #7  /opt/couchbase/bin/memcached() [0x400000+0x567fc8]
      2021-10-10T01:49:05.460260-07:00 CRITICAL    #8  /opt/couchbase/bin/memcached() [0x400000+0x5687cd]
      2021-10-10T01:49:05.460264-07:00 CRITICAL    #9  /opt/couchbase/bin/memcached() [0x400000+0x568cc8]
      2021-10-10T01:49:05.460294-07:00 CRITICAL    #10 /opt/couchbase/bin/memcached(_ZN5magma5Magma4Impl11syncKVStoreEtb+0x215) [0x400000+0x511885]
      2021-10-10T01:49:05.460296-07:00 CRITICAL    #11 /opt/couchbase/bin/memcached() [0x400000+0x511a60]
      2021-10-10T01:49:05.460297-07:00 CRITICAL    #12 /opt/couchbase/bin/memcached() [0x400000+0x51838e]
      2021-10-10T01:49:05.460306-07:00 CRITICAL    #13 /opt/couchbase/bin/memcached(_ZN5magma5Magma4Impl14CompactKVStoreEtNS0_9StoreTypeESt8functionIFSt10unique_ptrINS0_18CompactionCallbackESt14default_deleteIS5_EEtEE+0x36e) [0x400000+0x511f0e]
      2021-10-10T01:49:05.460308-07:00 CRITICAL    #14 /opt/couchbase/bin/memcached(_ZN5magma5Magma4Impl14CompactKVStoreEtNS0_9StoreTypeESt8functionIFSt10unique_ptrINS0_18CompactionCallbackESt14default_deleteIS5_EEtEE+0xff) [0x400000+0x511c9f]
      2021-10-10T01:49:05.460310-07:00 CRITICAL    #15 /opt/couchbase/bin/memcached(_ZN5magma5Magma4Impl14CompactKVStoreEtRKNS_5SliceES4_St8functionIFSt10unique_ptrINS0_18CompactionCallbackESt14default_deleteIS7_EEtEE+0x65) [0x400000+0x512095]
      2021-10-10T01:49:05.460312-07:00 CRITICAL    #16 /opt/couchbase/bin/memcached(_ZN5magma5Magma14CompactKVStoreEtRKNS_5SliceES3_St8functionIFSt10unique_ptrINS0_18CompactionCallbackESt14default_deleteIS6_EEtEE+0x6d) [0x400000+0x51239d]
      2021-10-10T01:49:05.460336-07:00 CRITICAL    #17 /opt/couchbase/bin/memcached() [0x400000+0x47766e]
      2021-10-10T01:49:05.460340-07:00 CRITICAL    #18 /opt/couchbase/bin/memcached() [0x400000+0x46e780]
      2021-10-10T01:49:05.460341-07:00 CRITICAL    #19 /opt/couchbase/bin/memcached() [0x400000+0x46ed06]
      2021-10-10T01:49:05.460345-07:00 CRITICAL    #20 /opt/couchbase/bin/memcached() [0x400000+0x406eca]
      2021-10-10T01:49:05.460370-07:00 CRITICAL    #21 /opt/couchbase/bin/memcached() [0x400000+0x4089b1]
      2021-10-10T01:49:05.460372-07:00 CRITICAL    #22 /opt/couchbase/bin/memcached() [0x400000+0x314096]
      2021-10-10T01:49:05.460374-07:00 CRITICAL    #23 /opt/couchbase/bin/memcached() [0x400000+0x654332]
      2021-10-10T01:49:05.460397-07:00 CRITICAL    #24 /opt/couchbase/bin/memcached() [0x400000+0x6514d5]
      2021-10-10T01:49:05.460426-07:00 CRITICAL    #25 /opt/couchbase/bin/memcached() [0x400000+0x7a51e0]
      2021-10-10T01:49:05.460427-07:00 CRITICAL    #26 /opt/couchbase/bin/memcached() [0x400000+0x78cf9a]
      2021-10-10T01:49:05.460429-07:00 CRITICAL    #27 /opt/couchbase/bin/memcached() [0x400000+0x7a8199]
      2021-10-10T01:49:05.460430-07:00 CRITICAL    #28 /opt/couchbase/bin/memcached() [0x400000+0x651164]
      2021-10-10T01:49:05.460431-07:00 CRITICAL    #29 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fb3541da000+0xcdd40]
      2021-10-10T01:49:05.460432-07:00 CRITICAL    #30 /lib64/libpthread.so.0() [0x7fb3560a8000+0x7ea5]
      2021-10-10T01:49:05.460434-07:00 CRITICAL    #31 /lib64/libc.so.6(clone+0x6d) [0x7fb3538f2000+0xfe8dd]
      Balakumarans-MacBook-Pro-2:cbcollect_info_ns_1@172.23.104.232_20211010-085015 balakumaran.g$ 
      

      bt of 9e3e32c-fd4c-4398-418dc5bf-529aecdc.dmp on 172.23.104.232

      (gdb) bt
      #0  __atomic_add (__val=1, __mem=0x235ca) at /opt/gcc-10.2.0/include/c++/10.2.0/ext/atomicity.h:96
      #1  __atomic_add_dispatch (__val=1, __mem=0x235ca) at /opt/gcc-10.2.0/include/c++/10.2.0/ext/atomicity.h:96
      #2  _M_add_ref_copy (this=0x235c2) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:142
      #3  __shared_count (__r=..., this=0x7fb2c5feb548) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:740
      #4  __shared_ptr (this=0x7fb2c5feb540) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:1181
      #5  shared_ptr (this=0x7fb2c5feb540) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr.h:149
      #6  Checkpoint (this=0x7fb2c5feb540) at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/lsm/checkpoint.h:46
      #7  construct<magma::KVStoreCheckpoint, magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (this=<optimized out>, __p=0x7fb2a098af40) at /opt/gcc-10.2.0/include/c++/10.2.0/ext/new_allocator.h:150
      #8  construct<magma::KVStoreCheckpoint, magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (__p=0x7fb2a098af40, __a=...) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/alloc_traits.h:512
      #9  std::vector<magma::KVStoreCheckpoint, std::allocator<magma::KVStoreCheckpoint> >::_M_realloc_insert<magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (this=this@entry=0x7fb2c5feb700, __position=__position@entry=...)
          at /opt/gcc-10.2.0/include/c++/10.2.0/bits/vector.tcc:449
      #10 0x00000000009663cf in emplace_back<magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (this=0x7fb2c5feb700) at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:1436
      #11 magma::KVStore::getAllCheckpoints(std::vector<magma::KVStoreCheckpoint, std::allocator<magma::KVStoreCheckpoint> >&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:1436
      #12 0x00000000009675e4 in magma::KVStore::verifyCheckpoints() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:1482
      #13 0x0000000000967fc8 in magma::KVStore::flushMemTables(magma::WAL*, magma::WALOffset, magma::FlushMode, magma::BlockingMode)::{lambda()#2}::operator()() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:513
      #14 0x00000000009687cd in magma::KVStore::flushMemTables(magma::WAL*, magma::WALOffset, magma::FlushMode, magma::BlockingMode) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:524
      #15 0x0000000000968cc8 in magma::KVStore::FlushMemTables (this=<optimized out>, wal=<optimized out>, flushMode=<optimized out>, blockMode=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:343
      #16 0x0000000000911885 in magma::Magma::Impl::syncKVStore(unsigned short, bool) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:1133
      #17 0x0000000000911a60 in std::_Function_handler<void (), magma::Magma::Impl::CompactKVStore(unsigned short, magma::Magma::StoreType, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::{lambda()#2}>::_M_invoke(std::_Any_data const&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:740
      #18 0x000000000091838e in magma::defer::~defer (this=0x7fb2c5febbb0, __in_chrg=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
      #19 0x0000000000911f0e in magma::Magma::Impl::CompactKVStore(unsigned short, magma::Magma::StoreType, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:753
      #20 0x0000000000911c9f in magma::Magma::Impl::CompactKVStore(unsigned short, magma::Magma::StoreType, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) ()
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:721
      #21 0x0000000000912095 in magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:773
      #22 0x000000000091239d in magma::Magma::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)
          () at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
      #23 0x000000000087766e in MagmaMemoryTrackingProxy::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) () at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
      #24 0x000000000086e780 in MagmaKVStore::compactDBInternal(std::unique_lock<std::mutex>&, std::shared_ptr<CompactionContext>) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/include/memcached/vbucket.h:62
      #25 0x000000000086ed06 in MagmaKVStore::compactDB(std::unique_lock<std::mutex>&, std::shared_ptr<CompactionContext>) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kvstore/magma-kvstore/magma-kvstore.cc:2058
      #26 0x0000000000806eca in EPBucket::compactInternal(LockedVBucketPtr&, CompactionConfig&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/vbucket.h:2599
      #27 0x00000000008089b1 in EPBucket::doCompact(Vbid, CompactionConfig&, std::vector<CookieIface const*, std::allocator<CookieIface const*> >&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:1359
      #28 0x0000000000714096 in CompactTask::run() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/tasks.cc:73
      #29 0x0000000000a54332 in GlobalTask::execute() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/globaltask.cc:68
      #30 0x0000000000a514d5 in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const (__closure=0x7fb2c5fec840) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:189
      #31 0x0000000000ba51e0 in operator() (this=0x7fb2c5fec840) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:416
      #32 folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) (this=0x7fb352299400, thread=..., 
          task=<unknown type in /usr/lib/debug/opt/couchbase/bin/memcached.debug, CU 0x5f9cf74, DIE 0x5fe260e>)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/ThreadPoolExecutor.cpp:97
      #33 0x0000000000b8cf9a in folly::CPUThreadPoolExecutor::threadRun (this=0x7fb352299400, thread=...)
          at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/CPUThreadPoolExecutor.cpp:265
      #34 0x0000000000ba8199 in __invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__t=<optimized out>, 
          __f=<optimized out>) at /usr/local/include/c++/7.3.0/bits/invoke.h:73
      #35 __invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__fn=<optimized out>)
          at /usr/local/include/c++/7.3.0/bits/invoke.h:95
      #36 __call<void, 0, 1> (__args=<optimized out>, this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:467
      #37 operator()<> (this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:551
      #38 folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:401
      #39 0x0000000000a51164 in operator() (this=0x7fb3527dd180) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:47
      #40 operator() (__closure=0x7fb3527dd180) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:47
      #41 folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...)
          at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:401
      #42 0x00007fb3542a7d40 in execute_native_thread_routine () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/thread.cc:80
      #43 0x00007fb3560afea5 in start_thread (arg=0x7fb2c5ffe700) at pthread_create.c:307
      #44 0x00007fb3539f08dd in ioperm () at ../sysdeps/unix/syscall-template.S:81
      #45 0x0000000000000000 in ?? ()
      (gdb) 
      

      cbcollect_info attached. We are still doing experiments with these tests so we don't have an exact baseline as to when these last passed.

      Attachments

        1. bt_full.txt
          36 kB
        2. info_threads.txt
          9 kB
        3. test.log.txt
          96 kB
        4. thread_apply_all_bt.txt
          3 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              sarath Sarath Lakshman
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty