Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
7.0.0-4325
-
Untriaged
-
Centos 64-bit
-
1
-
No
-
KV-Engine 2021-Feb
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.79048.ini GROUP=failover_with_collection_crud_durability_MAJORITY,rerun=False,upgrade_version=7.0.0-4325 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_graceful_failover_recovery,nodes_init=5,nodes_failover=1,recovery_type=full,override_spec_params=durability;replicas,durability=MAJORITY,replicas=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=failover_with_collection_crud_durability_MAJORITY
|
Steps to Repro
1) Create a 5 node cluster
2021-01-27 21:59:27,921 | test | INFO | pool-4-thread-6 | [table_view:display:72] Rebalance Overview
----------------------++-------------
Nodes | Services | Status |
----------------------++-------------
172.23.105.52 | kv | Cluster node |
172.23.105.53 | None | <--- IN — |
172.23.105.59 | None | <--- IN — |
172.23.105.64 | None | <--- IN — |
172.23.105.79 | None | <--- IN — |
----------------------++-------------
2) Create buckets/scopes/collections/data
-----------------+-----------------------------------------------------+----------
Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
-----------------+-----------------------------------------------------+----------
bucket1 | couchbase | 2 | none | 0 | 3000 | 1048576000 | 166419944 | 267464363 |
bucket2 | ephemeral | 2 | none | 0 | 3000 | 1048576000 | 244948568 | 170 |
default | couchbase | 2 | none | 0 | 500000 | 10485760000 | 584606712 | 411196169 |
-----------------+-----------------------------------------------------+----------
3) Graceful failover node of 172.23.105.79
2021-01-27 22:05:54,640 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:213] 1 nodes failed over as expected in 0.029000043869 seconds
|
4) Do full recovery + Rebalance
2021-01-27 22:06:26,605 | test | WARNING | MainThread | [rest_client:get_nodes:1710] 172.23.105.79 - Node not part of cluster inactiveFailed
|
We see 3 crashes.
On 172.23.105.53
8e6dbba9-2e3a-4123-0a2841b1-32001524.dmp
8e67c9b4-4a55-4b5b-ce42a0bf-7be225d5.dmp
On 172.23.105.59
cdb0d43e-32a6-4d67-eaa412ad-3fe31243.dmp
grep CRITICAL for 8e6dbba9-2e3a-4123-0a2841b1-32001524.dmp on 172.23.105.53
[ns_server:info,2021-01-27T21:39:33.390-08:00,babysitter_of_ns_1@cb.local:<0.249.0>:ns_port_server:log:224]memcached<0.249.0>: WARNING: Logging before InitGoogleLogging() is written to STDERR
|
memcached<0.249.0>: W0127 21:39:33.188218 99170 HazptrDomain.h:671] Using the default inline executor for asynchronous reclamation may be susceptible to deadlock if the current thread happens to hold a resource needed by the deleter of a reclaimable object
|
|
[ns_server:info,2021-01-27T21:39:39.493-08:00,babysitter_of_ns_1@cb.local:<0.249.0>:ns_port_server:log:224]memcached<0.249.0>: 2021-01-27T21:39:39.452151-08:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-4325). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/8e6dbba9-2e3a-4123-0a2841b1-32001524.dmp before terminating.
|
memcached<0.249.0>: 2021-01-27T21:39:39.452182-08:00 CRITICAL Stack backtrace of crashed thread:
|
memcached<0.249.0>: 2021-01-27T21:39:39.452412-08:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x145bbd]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452424-08:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x15b3fa]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452434-08:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x15b738]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452441-08:00 CRITICAL /lib64/libpthread.so.0() [0x7fc7b8fd3000+0xf5d0]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452454-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x78d66]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452462-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x77b1a]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452469-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x80fd8]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452476-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x8b382]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452486-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x188fbb]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452495-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x16dc13]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452502-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x167c92]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452513-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x2e71d6]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452523-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x2cf6ca]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452532-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x2ea2f9]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452541-08:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7fc7bd032000+0x1660d3]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452576-08:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7fc7b9708000+0xb9dcf]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452582-08:00 CRITICAL /lib64/libpthread.so.0() [0x7fc7b8fd3000+0x7dd5]
|
memcached<0.249.0>: 2021-01-27T21:39:39.452615-08:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7fc7b8c06000+0xfdead]
|
See bt_full_all_threads.txt for 8e6dbba9-2e3a-4123-0a2841b1-32001524.dmp on 172.23.105.53. Attaching cbcollect.
This test worked fine on 7.0.0-4291.
Attachments
Issue Links
- relates to
-
MB-43919 [Collections] Collections::VB::Manifest::verifyFlatbuffersData
- Closed