Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
Enterprise Edition 7.0.0 build 3383 ‧ IPv4 © 2020 Couchbase, Inc.
-
Untriaged
-
Centos 64-bit
-
1
-
Yes
Description
Script to Repro
./testrunner -i /tmp/win10-bucket-ops.ini rerun=False,quota_percent=95,crash_warning=True -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_graceful_failover_rebalance_out,nodes_init=5,nodes_failover=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=failover_with_collection_crud
|
Steps to Repro
1. Create 5 node cluster
2020-10-13 00:00:37,207 | test | INFO | pool-3-thread-6 | [table_view:display:72] Rebalance Overview
------------------------------------
Nodes | Services | Status |
------------------------------------
172.23.98.196 | kv | Cluster node |
172.23.98.195 | None | <--- IN — |
172.23.120.206 | None | <--- IN — |
172.23.104.186 | None | <--- IN — |
172.23.121.10 | None | <--- IN — |
------------------------------------
2. Create buckets/scope/collections/data
2020-10-13 00:12:21,321 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
--------------------------------------------------------------------------
Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
--------------------------------------------------------------------------
bucket1 | couchbase | 3 | none | 0 | 3000 | 1048576000 | 253857232 | 359987046 |
bucket2 | ephemeral | 3 | none | 0 | 3000 | 1048576000 | 363499152 | 170 |
default | couchbase | 3 | none | 0 | 500000 | 10485760000 | 718104144 | 583601498 |
--------------------------------------------------------------------------
3. Start data load again with CRUD on collections
4. Do a graceful failover of 2 nodes(172.23.104.186 and 172.23.121.10)
2020-10-13 00:12:27,401 | test | INFO | MainThread | [collections_rebalance:rebalance_operation:157] Starting rebalance operation of type : graceful_failover_rebalance_out
|
2020-10-13 00:14:27,693 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:129] 1 nodes failed over as expected in 0.0490000247955 seconds
|
2020-10-13 00:16:48,086 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:129] 2 nodes failed over as expected in 20.1369998455 seconds
|
5. Start rebalance out of those 2 nodes. Rebalance fails and minidumps are seen on 172.23.120.206.
2020-10-13 00:19:09,019 | test | ERROR | pool-3-thread-24 | [rest_client:_rebalance_status_and_progress:1479] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'status': u'none'} - rebalance failed
|
grep CRITICAL on 172.23.120.206
[root@localhost logs]# grep CRITICAL memcached.log.0000*
|
memcached.log.000041.txt:2020-10-13T00:19:04.665334-07:00 CRITICAL *** Fatal error encountered during exception handling ***
|
memcached.log.000041.txt:2020-10-13T00:19:04.665384-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): void Collections::VB::Manifest::DroppedCollections::remove(CollectionID, uint64_t) The collection cannot be found collection:0xcd seqno:1092
|
memcached.log.000041.txt:2020-10-13T00:19:04.854355-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-3383). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/761d0348-ae89-43ef-b46ea584-ca2d7f1e.dmp before terminating.
|
memcached.log.000041.txt:2020-10-13T00:19:04.854370-07:00 CRITICAL Stack backtrace of crashed thread:
|
memcached.log.000041.txt:2020-10-13T00:19:04.854513-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x19741d]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854534-07:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x1ac3aa]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854544-07:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x1ac6e8]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854550-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f39fb9f4000+0xf630]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854569-07:00 CRITICAL /lib64/libc.so.6(gsignal+0x37) [0x7f39fb626000+0x36387]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854585-07:00 CRITICAL /lib64/libc.so.6(abort+0x148) [0x7f39fb626000+0x37a78]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854618-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f39fc129000+0x91195]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854635-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x1a6d72]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854651-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f39fc129000+0x8ef86]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854670-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f39fc129000+0x8efd1]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854689-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f39fc129000+0x8f213]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854699-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x2a29cc]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854706-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x2a53cf]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854711-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x275003]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854889-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x122978]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854896-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x1264fd]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854903-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x1806bc]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854908-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x181899]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854913-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x1845c3]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854918-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f39ff711000+0x861bf]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854924-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f39fe1ac000+0x10947]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854930-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f39fb9f4000+0x7ea5]
|
memcached.log.000041.txt:2020-10-13T00:19:04.854957-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f39fb626000+0xfe8dd]
|
bt full of 761d0348-ae89-43ef-b46ea584-ca2d7f1e.dmp on 172.23.120.206
See bt_full_multi_node_graceful_rebalance_out.txt
cbcollect_info attached. This test passed on 7.0.0-3342.