Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
7.1.0-1386
-
Untriaged
-
Centos 64-bit
-
1
-
No
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.40477.ini GROUP=failover_with_rebalance_out_P0_set0,rerun=False,upgrade_version=7.1.0-1386 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_graceful_failover_rebalance_out,nodes_init=5,nodes_failover=1,override_spec_params=durability,durability=MAJORITY,bucket_spec=magma_dgm.10_percent_dgm.5_node_2_replica_magma_256,doc_size=256,randomize_value=True,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,skip_validations=False,GROUP=failover_with_rebalance_out_P0_set0'
|
Steps to Repro
1. Create a 5 node cluster
2021-09-30 13:07:44,331 | test | INFO | pool-6-thread-6 | [table_view:display:72] Rebalance Overview
+----------------+----------+-----------------------+---------------+--------------+
|
| Nodes | Services | Version | CPU | Status |
|
+----------------+----------+-----------------------+---------------+--------------+
|
| 172.23.104.76 | kv | 7.1.0-1386-enterprise | 2.25563909774 | Cluster node |
|
| 172.23.105.112 | None | | | <--- IN --- |
|
| 172.23.105.118 | None | | | <--- IN --- |
|
| 172.23.105.109 | None | | | <--- IN --- |
|
| 172.23.105.105 | None | | | <--- IN --- |
|
+----------------+----------+-----------------------+---------------+--------------+
|
2. Create buckets/scopes/collections/data
+---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
|
| Bucket | Type | Storage Backend | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used | ARR |
|
+---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
|
| bucket1 | couchbase | couchstore | 2 | none | 0 | 100000 | 9.77 GiB | 294.25 MiB | 342.80 MiB | 100 |
|
| bucket2 | couchbase | magma | 2 | none | 0 | 50000 | 4.88 GiB | 514.51 MiB | 544.71 MiB | 100 |
|
| default | couchbase | magma | 2 | none | 0 | 16287500 | 1.25 GiB | 890.78 MiB | 11.44 GiB | 2.47110360706 |
|
+---------+-----------+-----------------+----------+------------+-----+----------+-----------+------------+------------+---------------+
|
3. Graceful failover a node (172.23.105.105)
2021-09-30 14:06:33,346 | test | WARNING | MainThread | [rest_client:get_nodes:1846] 172.23.105.105 - Node not part of cluster inactiveFailed
|
4. Do a rebalance out of the node
2021-09-30 14:06:40,197 | test | INFO | pool-6-thread-30 | [table_view:display:72] Rebalance Overview
+----------------+----------+-----------------------+----------------+--------------+
|
| Nodes | Services | Version | CPU | Status |
|
+----------------+----------+-----------------------+----------------+--------------+
|
| 172.23.105.118 | kv | 7.1.0-1386-enterprise | 29.4042607475 | Cluster node |
|
| 172.23.105.105 | kv | 7.1.0-1386-enterprise | 0.526711813394 | --- OUT ---> |
|
| 172.23.105.112 | kv | 7.1.0-1386-enterprise | 25.1047087194 | Cluster node |
|
| 172.23.105.109 | kv | 7.1.0-1386-enterprise | 26.2744103475 | Cluster node |
|
| 172.23.104.76 | kv | 7.1.0-1386-enterprise | 29.2431192661 | Cluster node |
|
+----------------+----------+-----------------------+----------------+--------------+
|
Rebalance fails and we see a minidump 08f3ed68-8449-41fa-bb8b799f-87c1e633.dmp on 172.23.105.118..
grep CRITICAL on 172.23.105.118
Balakumarans-MacBook-Pro-2:cbcollect_info_ns_1@172.23.105.118_20210930-211324 balakumaran.g$ grep CRITICAL memcached.log
|
2021-09-30T14:12:17.268313-07:00 CRITICAL Detected previous crash
|
2021-09-30T14:12:17.268388-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.1.0-1386). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/08f3ed68-8449-41fa-bb8b799f-87c1e633.dmp before terminating.
|
2021-09-30T14:12:17.268399-07:00 CRITICAL Stack backtrace of crashed thread:
|
2021-09-30T14:12:17.268402-07:00 CRITICAL #0 /opt/couchbase/bin/memcached() [0x400000+0x69d858]
|
2021-09-30T14:12:17.268404-07:00 CRITICAL #1 /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x6ed8ca]
|
2021-09-30T14:12:17.268406-07:00 CRITICAL #2 /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x6edc08]
|
2021-09-30T14:12:17.268407-07:00 CRITICAL #3 /lib64/libpthread.so.0() [0x7fb0486e1000+0xf630]
|
2021-09-30T14:12:17.268409-07:00 CRITICAL #4 /opt/couchbase/bin/memcached() [0x400000+0x538600]
|
2021-09-30T14:12:17.268410-07:00 CRITICAL #5 /opt/couchbase/bin/memcached() [0x400000+0x52f6af]
|
2021-09-30T14:12:17.268434-07:00 CRITICAL #6 /opt/couchbase/bin/memcached() [0x400000+0x5308c4]
|
2021-09-30T14:12:17.268435-07:00 CRITICAL #7 /opt/couchbase/bin/memcached() [0x400000+0x531236]
|
2021-09-30T14:12:17.268437-07:00 CRITICAL #8 /opt/couchbase/bin/memcached() [0x400000+0x531a0d]
|
2021-09-30T14:12:17.268466-07:00 CRITICAL #9 /opt/couchbase/bin/memcached() [0x400000+0x531f48]
|
2021-09-30T14:12:17.268491-07:00 CRITICAL #10 /opt/couchbase/bin/memcached(_ZN5magma5Magma4Impl11syncKVStoreEtb+0x215) [0x400000+0x4dd5c5]
|
2021-09-30T14:12:17.268494-07:00 CRITICAL #11 /opt/couchbase/bin/memcached() [0x400000+0x4dd740]
|
2021-09-30T14:12:17.268495-07:00 CRITICAL #12 /opt/couchbase/bin/memcached() [0x400000+0x4e40ae]
|
2021-09-30T14:12:17.268507-07:00 CRITICAL #13 /opt/couchbase/bin/memcached(_ZN5magma5Magma4Impl14CompactKVStoreEtRKNS_5SliceES4_St8functionIFSt10unique_ptrINS0_18CompactionCallbackESt14default_deleteIS7_EEtEE+0x29e) [0x400000+0x4de00e]
|
2021-09-30T14:12:17.268510-07:00 CRITICAL #14 /opt/couchbase/bin/memcached(_ZN5magma5Magma14CompactKVStoreEtRKNS_5SliceES3_St8functionIFSt10unique_ptrINS0_18CompactionCallbackESt14default_deleteIS6_EEtEE+0x6d) [0x400000+0x4de0dd]
|
2021-09-30T14:12:17.268513-07:00 CRITICAL #15 /opt/couchbase/bin/memcached() [0x400000+0x45629e]
|
2021-09-30T14:12:17.268515-07:00 CRITICAL #16 /opt/couchbase/bin/memcached() [0x400000+0x44c5ea]
|
2021-09-30T14:12:17.268539-07:00 CRITICAL #17 /opt/couchbase/bin/memcached() [0x400000+0x44da06]
|
2021-09-30T14:12:17.268541-07:00 CRITICAL #18 /opt/couchbase/bin/memcached() [0x400000+0x3e9ada]
|
2021-09-30T14:12:17.268542-07:00 CRITICAL #19 /opt/couchbase/bin/memcached() [0x400000+0x3eb0f1]
|
2021-09-30T14:12:17.268544-07:00 CRITICAL #20 /opt/couchbase/bin/memcached() [0x400000+0x3089c6]
|
2021-09-30T14:12:17.268547-07:00 CRITICAL #21 /opt/couchbase/bin/memcached() [0x400000+0x61cf32]
|
2021-09-30T14:12:17.268548-07:00 CRITICAL #22 /opt/couchbase/bin/memcached() [0x400000+0x61a055]
|
2021-09-30T14:12:17.268549-07:00 CRITICAL #23 /opt/couchbase/bin/memcached() [0x400000+0x7649a0]
|
2021-09-30T14:12:17.268594-07:00 CRITICAL #24 /opt/couchbase/bin/memcached() [0x400000+0x74c75a]
|
2021-09-30T14:12:17.268632-07:00 CRITICAL #25 /opt/couchbase/bin/memcached() [0x400000+0x767959]
|
2021-09-30T14:12:17.268634-07:00 CRITICAL #26 /opt/couchbase/bin/memcached() [0x400000+0x619ce4]
|
2021-09-30T14:12:17.268652-07:00 CRITICAL #27 /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fb0467f7000+0xcdd40]
|
2021-09-30T14:12:17.268654-07:00 CRITICAL #28 /lib64/libpthread.so.0() [0x7fb0486e1000+0x7ea5]
|
2021-09-30T14:12:17.268655-07:00 CRITICAL #29 /lib64/libc.so.6(clone+0x6d) [0x7fb045f0f000+0xfe8dd]
|
Balakumarans-MacBook-Pro-2:cbcollect_info_ns_1@172.23.105.118_20210930-211324 balakumaran.g$
|
bt of 08f3ed68-8449-41fa-bb8b799f-87c1e633.dmp on 172.23.105.118.
(gdb)
|
#0 __atomic_add (__val=1, __mem=0x20) at /opt/gcc-10.2.0/include/c++/10.2.0/ext/atomicity.h:96
|
#1 __atomic_add_dispatch (__val=1, __mem=0x20) at /opt/gcc-10.2.0/include/c++/10.2.0/ext/atomicity.h:96
|
#2 _M_add_ref_copy (this=0x18) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:142
|
#3 __shared_count (__r=..., this=0x7fb00bfec708) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:740
|
#4 __shared_ptr (this=0x7fb00bfec700) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr_base.h:1181
|
#5 shared_ptr (this=0x7fb00bfec700) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/shared_ptr.h:149
|
#6 Checkpoint (this=0x7fb00bfec6f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/lsm/checkpoint.h:46
|
#7 construct<magma::KVStoreCheckpoint, magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (this=<optimized out>, __p=0x7faf9e9511c0) at /opt/gcc-10.2.0/include/c++/10.2.0/ext/new_allocator.h:150
|
#8 construct<magma::KVStoreCheckpoint, magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (__p=0x7faf9e9511c0, __a=...) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/alloc_traits.h:512
|
#9 std::vector<magma::KVStoreCheckpoint, std::allocator<magma::KVStoreCheckpoint> >::_M_realloc_insert<magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (this=this@entry=0x7fb00bfec8b0, __position=__position@entry=...)
|
at /opt/gcc-10.2.0/include/c++/10.2.0/bits/vector.tcc:449
|
#10 0x000000000092f6af in emplace_back<magma::Checkpoint&, magma::Checkpoint&, magma::Checkpoint&> (this=0x7fb00bfec8b0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:1424
|
#11 magma::KVStore::getAllCheckpoints(std::vector<magma::KVStoreCheckpoint, std::allocator<magma::KVStoreCheckpoint> >&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:1424
|
#12 0x00000000009308c4 in magma::KVStore::verifyCheckpoints() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:1469
|
#13 0x0000000000931236 in magma::KVStore::flushMemTables(magma::WAL*, magma::WALOffset, magma::FlushMode, magma::BlockingMode)::{lambda()#2}::operator()() ()
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:503
|
#14 0x0000000000931a0d in magma::KVStore::flushMemTables(magma::WAL*, magma::WALOffset, magma::FlushMode, magma::BlockingMode) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:514
|
#15 0x0000000000931f48 in magma::KVStore::FlushMemTables (this=<optimized out>, wal=<optimized out>, flushMode=<optimized out>, blockMode=<optimized out>)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/kvstore/kvstore.cc:336
|
#16 0x00000000008dd5c5 in magma::Magma::Impl::syncKVStore(unsigned short, bool) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:1133
|
#17 0x00000000008dd740 in std::_Function_handler<void (), magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>)::{lambda()#2}>::_M_invoke(std::_Any_data const&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:784
|
#18 0x00000000008e40ae in magma::defer::~defer (this=0x7fb00bfecd50, __in_chrg=<optimized out>) at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
|
#19 0x00000000008de00e in magma::Magma::Impl::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/magma/magma/db.cc:798
|
#20 0x00000000008de0dd in magma::Magma::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) () at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
|
#21 0x000000000085629e in MagmaMemoryTrackingProxy::CompactKVStore(unsigned short, magma::Slice const&, magma::Slice const&, std::function<std::unique_ptr<magma::Magma::CompactionCallback, std::default_delete<magma::Magma::CompactionCallback> > (unsigned short)>) () at /opt/gcc-10.2.0/include/c++/10.2.0/bits/std_function.h:248
|
#22 0x000000000084c5ea in MagmaKVStore::compactDBInternal(std::unique_lock<std::mutex>&, std::shared_ptr<CompactionContext>) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/include/memcached/vbucket.h:62
|
#23 0x000000000084da06 in MagmaKVStore::compactDB(std::unique_lock<std::mutex>&, std::shared_ptr<CompactionContext>) ()
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kvstore/magma-kvstore/magma-kvstore.cc:2032
|
#24 0x00000000007e9ada in EPBucket::compactInternal(LockedVBucketPtr&, CompactionConfig&) () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/vbucket.h:2599
|
#25 0x00000000007eb0f1 in EPBucket::doCompact(Vbid, CompactionConfig&, std::vector<CookieIface const*, std::allocator<CookieIface const*> >&) ()
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/ep_bucket.cc:1359
|
#26 0x00000000007089c6 in CompactTask::run() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/tasks.cc:73
|
#27 0x0000000000a1cf32 in GlobalTask::execute() () at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/globaltask.cc:68
|
#28 0x0000000000a1a055 in FollyExecutorPool::TaskProxy::scheduleViaCPUPool()::{lambda()#2}::operator()() const (__closure=0x7fb00bfed840)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:189
|
#29 0x0000000000b649a0 in operator() (this=0x7fb00bfed840) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:416
|
#30 folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) (this=0x7fb044899400, thread=...,
|
task=<unknown type in /usr/lib/debug/opt/couchbase/bin/memcached.debug, CU 0x65a24fb, DIE 0x65e7bd8>)
|
at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/ThreadPoolExecutor.cpp:97
|
#31 0x0000000000b4c75a in folly::CPUThreadPoolExecutor::threadRun (this=0x7fb044899400, thread=...)
|
at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/executors/CPUThreadPoolExecutor.cpp:265
|
#32 0x0000000000b67959 in __invoke_impl<void, void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (
|
__t=<optimized out>, __f=<optimized out>) at /usr/local/include/c++/7.3.0/bits/invoke.h:73
|
#33 __invoke<void (folly::ThreadPoolExecutor::*&)(std::shared_ptr<folly::ThreadPoolExecutor::Thread>), folly::ThreadPoolExecutor*&, std::shared_ptr<folly::ThreadPoolExecutor::Thread>&> (__fn=<optimized out>)
|
at /usr/local/include/c++/7.3.0/bits/invoke.h:95
|
#34 __call<void, 0, 1> (__args=<optimized out>, this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:467
|
#35 operator()<> (this=<optimized out>) at /usr/local/include/c++/7.3.0/functional:551
|
#36 folly::detail::function::FunctionTraits<void ()>::callBig<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)> >(folly::detail::function::Data&) (p=...) at /home/couchbase/jenkins/workspace/cbdeps-platform-build-old/deps/packages/build/folly/folly-prefix/src/folly/folly/Function.h:401
|
#37 0x0000000000a19ce4 in operator() (this=0x7fb044de1280) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:47
|
#38 operator() (__closure=0x7fb044de1280) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/executor/folly_executorpool.cc:47
|
#39 folly::detail::function::FunctionTraits<void ()>::callBig<CBRegisteredThreadFactory::newThread(folly::Function<void ()>&&)::{lambda()#1}>(folly::detail::function::Data&) (p=...)
|
at /home/couchbase/jenkins/workspace/couchbase-server-unix/server_build/tlm/deps/folly.exploded/include/folly/Function.h:401
|
#40 0x00007fb0468c4d40 in execute_native_thread_routine () at /tmp/deploy/objdir/../gcc-10.2.0/libstdc++-v3/src/c++11/thread.cc:80
|
#41 0x00007fb0486e8ea5 in start_thread (arg=0x7fb00bfff700) at pthread_create.c:307
|
#42 0x00007fb04600d8dd in ioperm () at ../sysdeps/unix/syscall-template.S:81
|
#43 0x0000000000000000 in ?? ()
|
This is a new test being run. So, we don't have a baseline for this test yet.
cbcollect_info attached.