Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.5.1, 6.6.0, 6.5.0, Cheshire-Cat
-
Centos 7 64 bit; Couchbase Enterprise Build 7.0.0-3009
-
Triaged
-
-
1
-
No
-
KV-Engine Sprint 2020-Dec, KV-Engine 2021-Jan
Description
Script to Repo
./testrunner -i /tmp/durability_volume.ini rerun=False -t bucket_collections.collections_network_split.CollectionsNetworkSplit.test_collections_crud_with_network_split,nodes_init=4,bucket_spec=single_bucket.buckets_all_membase_for_rebalance_tests_more_collections,override_spec_params=durability;replicas,durability=PERSIST_TO_MAJORITY,replicas=2,subsequent_action=rebalance-out
|
Steps to Reproduce
1. Create a 4 node cluster
2020-09-04 04:01:10,806 | test | INFO | pool-2-thread-7 | [table_view:display:72] Rebalance Overview
-----------------------++-------------
Nodes | Services | Status |
-----------------------++-------------
172.23.105.211 | kv | Cluster node |
172.23.105.212 | None | <--- IN — |
172.23.105.213 | None | <--- IN — |
172.23.105.215 | None | <--- IN — |
-----------------------++-------------
2. Initial data load into bucket
2020-09-04 04:05:12,655 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
-----------------+----------------------------------------------------+----------
Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
-----------------+----------------------------------------------------+----------
default | couchbase | 2 | none | 0 | 500000 | 8388608000 | 508642048 | 597462790 |
-----------------+----------------------------------------------------+----------
3. Perform a network split by blocking .212 traffic on .211 and vice versa with parallel data load
4. Hard failover .212 with data load in parallel
5. Rebalance out .212 with data load in parallel
2020-09-04 04:18:06,657 | test | INFO | pool-2-thread-26 | [table_view:display:72] Rebalance Overview
-----------------------++-------------
Nodes | Services | Status |
-----------------------++-------------
172.23.105.215 | kv | Cluster node |
172.23.105.212 | [u'kv'] | — OUT ---> |
172.23.105.213 | kv | Cluster node |
172.23.105.211 | kv | Cluster node |
-----------------------++-------------
Rebalance op fails with coredumps on .211
BT 23ee0a42-688b-4cd0-7e3764b7-7b4a649f.dmp
(gdb) bt full
|
#0 0x00007ff1ab51a387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
|
resultvar = 0
|
pid = 32357
|
selftid = 32670
|
#1 0x00007ff1ab51ba78 in __GI_abort () at abort.c:90
|
save_stage = 2
|
act = {__sigaction_handler = {sa_handler = 0x7ff1ab8ac1c0 <_IO_2_1_stderr_>, sa_sigaction = 0x7ff1ab8ac1c0 <_IO_2_1_stderr_>}, sa_mask = {__val = {140675938859561, 0, 140675938378339, 140675752100568, 140675941843392, 1,
|
140675941843523, 140675941827456, 140675938384510, 140675941843392, 10, 140674540258608, 140674967992256, 140674967992320, 140675938385779, 140675941843392}}, sa_flags = -1408806528, sa_restorer = 0x7ff1a03b82d8}
|
sigs = {__val = {32, 0 <repeats 15 times>}}
|
#2 0x00007ff1ac078195 in __gnu_cxx::__verbose_terminate_handler() () from /opt/couchbase/bin/../lib/libstdc++.so.6
|
No symbol table info available.
|
#3 0x000000000054edb2 in backtrace_terminate_handler() ()
|
No symbol table info available.
|
#4 0x00007ff1ac075f86 in __cxxabiv1::__terminate(void (*)()) () from /opt/couchbase/bin/../lib/libstdc++.so.6
|
No symbol table info available.
|
#5 0x00007ff1ac075fd1 in std::terminate() () from /opt/couchbase/bin/../lib/libstdc++.so.6
|
No symbol table info available.
|
#6 0x00007ff1ac076213 in __cxa_throw () from /opt/couchbase/bin/../lib/libstdc++.so.6
|
No symbol table info available.
|
#7 0x00007ff1af80f256 in ThrowExceptionPolicy<long>::nonMonotonic(long const&, long const&) () from /opt/couchbase/bin/../lib/libep.so
|
No symbol table info available.
|
#8 0x00007ff1af8999b2 in PassiveDurabilityMonitor::State::updateHighPreparedSeqno() () from /opt/couchbase/bin/../lib/libep.so
|
No symbol table info available.
|
#9 0x00007ff1af89c1b8 in PassiveDurabilityMonitor::notifyLocalPersistence() () from /opt/couchbase/bin/../lib/libep.so
|
No symbol table info available.
|
#10 0x00007ff1af9691f6 in VBucket::notifyPersistenceToDurabilityMonitor() () from /opt/couchbase/bin/../lib/libep.so
|
No symbol table info available.
|
#11 0x00007ff1af8a6be8 in EPBucket::flushVBucket(Vbid) () from /opt/couchbase/bin/../lib/libep.so
|
No symbol table info available.
|
#12 0x00007ff1af8fe6bc in Flusher::flushVB() () from /opt/couchbase/bin/../lib/libep.so
|
No symbol table info available.
|
#13 0x00007ff1af8ff899 in Flusher::step(GlobalTask*) () from /opt/couchbase/bin/../lib/libep.so
|
No symbol table info available.
|
#14 0x00007ff1af9025f3 in GlobalTask::execute() () from /opt/couchbase/bin/../lib/libep.so
|
No symbol table info available.
|
#15 0x00007ff1af808faf in CB3ExecutorThread::run() () from /opt/couchbase/bin/../lib/libep.so
|
No symbol table info available.
|
#16 0x00007ff1ae27c777 in platform_thread_wrap(void*) () from /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0
|
No symbol table info available.
|
#17 0x00007ff1ab8b9ea5 in start_thread (arg=0x7ff1717fa700) at pthread_create.c:307
|
__res = <optimized out>
|
pd = 0x7ff1717fa700
|
now = <optimized out>
|
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140674968037120, -2338991119659410976, 0, 8392704, 0, 140674968037120, 2335355026998319584, 2335517656178907616}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {
|
prev = 0x0, cleanup = 0x0, canceltype = 0}}}
|
not_first_call = <optimized out>
|
pagesize_m1 = <optimized out>
|
sp = <optimized out>
|
freesize = <optimized out>
|
#18 0x00007ff1ab5e28dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
|
other core dump 74a76e88-5cdb-4946-095eaf94-a5e70a07.dmp is similar to
https://issues.couchbase.com/browse/MB-41235