Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
Enterprise Edition 7.0.0 build 2309 ‧ IPv4 © 2020 Couchbase, Inc.
-
Untriaged
-
Centos 64-bit
-
-
1
-
No
-
KV Sprint 2020-June, KV Sprint 2020-July
Description
Script to Repro
./testrunner -i /tmp/win10-bucket-ops.ini sdk_client_pool=True,rerun=False,crash_warning=True,quota_percent=90,get-cbcollect-info=False -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,nodes_init=5,nodes_failover=2,recovery_type=delta,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=failover_with_collection_crud
|
Steps to Repro
1) Create a 4 node cluster
2020-06-10 02:32:24,578 | test | INFO | pool-14-thread-7 | [table_view:display:72] Rebalance Overview
------------------------------------
Nodes | Services | Status |
------------------------------------
172.23.98.196 | kv | Cluster node |
172.23.98.195 | None | <--- IN — |
172.23.121.10 | None | <--- IN — |
172.23.104.186 | None | <--- IN — |
172.23.120.201 | None | <--- IN — |
------------------------------------
2) Create buckets/scope/collections/documents
----------------------------------------------------------------+
Bucket | Type | Replicas | TTL | Items | RAM Quota | RAM Used | Disk Used |
----------------------------------------------------------------+
bucket1 | membase | 3 | 0 | 3000 | 1048576000 | 220907152 | 273142632 |
bucket2 | ephemeral | 3 | 0 | 3000 | 1048576000 | 329786832 | 170 |
default | membase | 3 | 0 | 500000 | 10485760000 | 767172112 | 543442137 |
----------------------------------------------------------------+
3) Start hard failover of 2 nodes((172.23.104.186 and 172.23.120.201)) and start data load(collection/scopes drop/create) while failover is happening
2020-06-10 02:48:04,164 | test | INFO | MainThread | [collections_rebalance:rebalance_operation:112] Starting rebalance operation of type : hard_failover_recovery
2020-06-10 02:50:22,744 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:84] 1 nodes failed over as expected in 0.103999853134 seconds
2020-06-10 02:52:36,690 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:84] 2 nodes failed over as expected in 0.102999925613 seconds
4) After failover completes , do a data load again. Wait for data load to complete.
5) Do a delta recovery 2 nodes and start rebalance(172.23.104.186 and 172.23.120.201)
2020-06-10 02:55:21,378 | test | WARNING | MainThread | [rest_client:get_nodes:1671] 172.23.104.186 - Node not part of cluster inactiveFailed
2020-06-10 02:55:21,378 | test | WARNING | MainThread | [rest_client:get_nodes:1671] 172.23.120.201 - Node not part of cluster inactiveFailed
2020-06-10 02:55:31,907 | test | INFO | MainThread | [bucket_ready_functions:perform_tasks_from_spec:4433] Performing scope/collection specific operations
2020-06-10 02:55:32,322 | test | INFO | pool-14-thread-5 | [table_view:display:72] Rebalance Overview
------------------------------------
Nodes | Services | Status |
------------------------------------
172.23.98.196 | kv | Cluster node |
172.23.98.195 | kv | Cluster node |
172.23.104.186 | kv | Cluster node |
172.23.120.201 | kv | Cluster node |
172.23.121.10 | kv | Cluster node |
------------------------------------
Rebalance fails and coredumps are seen on 172.23.120.201. The Coredump we are interested in is 94edd15f-8a38-4975-5a86828f-a015955e.dmp. Others are just dup of MB-39532.
From log
memcached<0.114.0>: 2020-06-10T02:55:52.661996-07:00 CRITICAL Caught unhandled std::exception-derived exception. what(): ThrowExceptionUnderflowPolicy current:0 arg:1
|
memcached<0.114.0>: terminate called after throwing an instance of 'std::underflow_error'
|
memcached<0.114.0>: what(): ThrowExceptionUnderflowPolicy current:0 arg:1
|
memcached<0.114.0>: 2020-06-10T02:55:52.787137-07:00 CRITICAL Breakpad caught a crash (Couchbase version 7.0.0-2309). Writing crash dump to /opt/couchbase/var/lib/couchbase/crash/94edd15f-8a38-4975-5a86828f-a015955e.dmp before terminating.
|
memcached<0.114.0>: 2020-06-10T02:55:52.787146-07:00 CRITICAL Stack backtrace of crashed thread:
|
memcached<0.114.0>: 2020-06-10T02:55:52.787262-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x13b80d]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787271-07:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler12GenerateDumpEPNS0_12CrashContextE+0x3ea) [0x400000+0x150b1a]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787276-07:00 CRITICAL /opt/couchbase/bin/memcached(_ZN15google_breakpad16ExceptionHandler13SignalHandlerEiP9siginfo_tPv+0xb8) [0x400000+0x150e58]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787281-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f07ad9f1000+0xf630]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787299-07:00 CRITICAL /lib64/libc.so.6(gsignal+0x37) [0x7f07ad623000+0x36387]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787314-07:00 CRITICAL /lib64/libc.so.6(abort+0x148) [0x7f07ad623000+0x37a78]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787333-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x125) [0x7f07ae126000+0x91195]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787339-07:00 CRITICAL /opt/couchbase/bin/memcached() [0x400000+0x14c012]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787353-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f07ae126000+0x8ef86]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787364-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f07ae126000+0x8efd1]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787375-07:00 CRITICAL /opt/couchbase/bin/../lib/libstdc++.so.6() [0x7f07ae126000+0x8f213]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787381-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x58174]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787387-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x22909b]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787392-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x1f23a5]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787397-07:00 CRITICAL /opt/couchbase/bin/../lib/libcouchstore.so() [0x7f07b160c000+0x119db]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787399-07:00 CRITICAL /opt/couchbase/bin/../lib/libcouchstore.so() [0x7f07b160c000+0x113b1]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787401-07:00 CRITICAL /opt/couchbase/bin/../lib/libcouchstore.so() [0x7f07b160c000+0x116d4]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787403-07:00 CRITICAL /opt/couchbase/bin/../lib/libcouchstore.so() [0x7f07b160c000+0x12349]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787407-07:00 CRITICAL /opt/couchbase/bin/../lib/libcouchstore.so(couchstore_save_documents_and_callback+0x850) [0x7f07b160c000+0x268c0]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787412-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x2013d8]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787415-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x201e9a]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787420-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x202535]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787425-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0xea552]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787430-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0xee819]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787435-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x142d7c]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787438-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x143f49]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787441-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x146da3]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787443-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f07b186f000+0x13d70f]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787448-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f07b03ab000+0x10777]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787452-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f07ad9f1000+0x7ea5]
|
memcached<0.114.0>: 2020-06-10T02:55:52.787475-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f07ad623000+0xfe8dd]
|
Backtrace
See bt_full.txt for bt full. Was running into character limits.
It seems to have little in common with MB-39573 even though both have exception being thrown by ThrowExceptionUnderflowPolicy.
cbcollect_info attached.