Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
7.0.0-4374-enterprise
-
Untriaged
-
Centos 64-bit
-
1
-
Yes
-
KV-Engine 2021-Feb
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops.ini rerun=False,quota_percent=95,crash_warning=True -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,data_load_stage=during,quota_percent=80,nodes_failover=2,recovery_type=full,rerun=False,nodes_init=5,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections'
|
Steps to Repro
1) Create a 5 node cluster
2021-02-03 20:25:50,934 | test | INFO | pool-1-thread-6 | [table_view:display:72] Rebalance Overview
------------------------------------
Nodes | Services | Status |
------------------------------------
172.23.98.196 | kv | Cluster node |
172.23.98.195 | None | <--- IN — |
172.23.121.10 | None | <--- IN — |
172.23.104.186 | None | <--- IN — |
172.23.120.206 | None | <--- IN — |
------------------------------------
2)Create buckets/scopes/collections/data
-------------------------------------------------------------------------
Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
-------------------------------------------------------------------------
bucket1 | couchbase | 3 | none | 0 | 3000 | 1048576000 | 218057960 | 322905355 |
bucket2 | ephemeral | 3 | none | 0 | 3000 | 1048576000 | 329473336 | 170 |
default | couchbase | 3 | none | 0 | 250000 | 5242880000 | 470890808 | 460819585 |
-------------------------------------------------------------------------
3)Hard faiilover 2 nodes.
2021-02-03 20:29:36,849 | test | INFO | MainThread | [collections_rebalance:rebalance_operation:600] failing over nodes [ip:172.23.104.186 port:8091 ssh_username:root, ip:172.23.120.206 port:8091 ssh_username:root]
|
2021-02-03 20:29:50,240 | test | INFO | pool-1-thread-23 | [rest_client:monitorRebalance:1438] Rebalance done. Taken 8.05900001526 seconds to complete
|
2021-02-03 20:29:50,243 | test | INFO | pool-1-thread-23 | [common_lib:sleep:22] Sleep 8.05900001526 seconds. Reason: Wait after rebalance complete
|
2021-02-03 20:31:58,346 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:224] 1 nodes failed over as expected in 0.0409998893738 seconds
|
2021-02-03 20:32:10,351 | test | INFO | pool-1-thread-8 | [rest_client:monitorRebalance:1438] Rebalance done. Taken 8.07899999619 seconds to complete
|
2021-02-03 20:32:10,355 | test | INFO | pool-1-thread-8 | [common_lib:sleep:22] Sleep 8.07899999619 seconds. Reason: Wait after rebalance complete
|
2021-02-03 20:34:18,476 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:224] 2 nodes failed over as expected in 0.0379998683929 seconds
|
4)Do full recovery and rebalance
2021-02-03 20:34:44,459 | test | WARNING | MainThread | [rest_client:get_nodes:1696] 172.23.104.186 - Node not part of cluster inactiveFailed
|
2021-02-03 20:34:44,459 | test | WARNING | MainThread | [rest_client:get_nodes:1696] 172.23.120.206 - Node not part of cluster inactiveFailed
|
Rebalance fails and we lot of mindumps. The one of interest is shown below
grep CRITICAL on 172.23.121.10
memcached.log.000015.txt:2021-02-03T20:36:33.577265-08:00 CRITICAL Caught unhandled std::exception-derived exception. what(): decodeManifest: duplicate collection:0xce in stored data
|
memcached.log.000015.txt:2021-02-03T20:36:33.577964-08:00 CRITICAL *** Fatal error encountered during exception handling ***
|
memcached.log.000015.txt:2021-02-03T20:36:33.602241-08:00 CRITICAL *** Fatal error encountered during exception handling ***
|
memcached.log.000015.txt:2021-02-03T20:36:33.618348-08:00 CRITICAL *** Fatal error encountered during exception handling ***
|
memcached.log.000015.txt:2021-02-03T20:36:33.985955-08:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-4374). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/92e07269-f018-44cc-04ca8aa4-cdf08df0.dmp before terminating.
|
memcached.log.000015.txt:2021-02-03T20:36:33.985973-08:00 CRITICAL Stack backtrace of crashed thread:
|
memcached.log.000015.txt:2021-02-03T20:36:33.986765-08:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x145bbd]
|
memcached.log.000015.txt:2021-02-03T20:36:33.986791-08:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x15b3fa]
|
memcached.log.000015.txt:2021-02-03T20:36:33.986814-08:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x15b738]
|
memcached.log.000015.txt:2021-02-03T20:36:33.986830-08:00 CRITICAL /lib64/libpthread.so.0() [0x7f23971e1000+0xf630]
|
memcached.log.000015.txt:2021-02-03T20:36:33.986873-08:00 CRITICAL /lib64/libc.so.6(gsignal+0x37) [0x7f2396e13000+0x36387]
|
memcached.log.000015.txt:2021-02-03T20:36:33.986915-08:00 CRITICAL /lib64/libc.so.6(abort+0x148) [0x7f2396e13000+0x37a78]
|
memcached.log.000015.txt:2021-02-03T20:36:33.986972-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f2397916000+0x91195]
|
memcached.log.000015.txt:2021-02-03T20:36:33.986995-08:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x155632]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987042-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2397916000+0x8ef86]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987088-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2397916000+0x8efd1]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987112-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f239b247000+0x16f2f3]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987133-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f239b247000+0x169352]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987157-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f239b247000+0x2e9bd6]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987186-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f239b247000+0x2d20ca]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987210-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f239b247000+0x2eccf9]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987231-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f239b247000+0x167793]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987297-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f2397916000+0xb9dcf]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987311-08:00 CRITICAL /lib64/libpthread.so.0() [0x7f23971e1000+0x7ea5]
|
memcached.log.000015.txt:2021-02-03T20:36:33.987365-08:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f2396e13000+0xfe8dd]
|
cbcollect_info attached. This was not seen on 7.0.0-4342.
This bug could be related to MB-44097 as its the same test and the same minidumps of MB-44097 is also seen here.
Attachments
Issue Links
- relates to
-
MB-44097 Crash when collection disk size underflows with concurrent flush & compaction
- Closed