Details
-
Bug
-
Resolution: Fixed
-
Major
-
Cheshire-Cat
-
7.0.0-3026-enterprise
-
Untriaged
-
Centos 64-bit
-
-
1
-
No
Description
Script to repro
./testrunner -i /tmp/win10-bucket-ops.ini rerun=False,crash_warning=True,quota_percent=95 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_out,nodes_init=5,nodes_out=2,update_replica=True,updated_num_replicas=3,bucket_spec=multi_bucket.buckets_all_membase_for_rebalance_tests_more_collections,data_load_stage=during,data_load_spec=volume_test_load_with_CRUD_on_collections,skip_validations=False,override_spec_params=replicas,replicas=0,GROUP=replica_update_with_collection_crud
|
Steps to Repro
1. Create a 5 node cluster
------------------------------------
Nodes | Services | Status |
------------------------------------
172.23.98.196 | kv | Cluster node |
172.23.98.195 | None | <--- IN — |
172.23.120.206 | None | <--- IN — |
172.23.104.186 | None | <--- IN — |
172.23.121.10 | None | <--- IN — |
------------------------------------
2. Create Buckets/Scopes/Collections/Data
2020-09-07 15:11:31,572 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
--------------------------------------------------------------------------
Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
--------------------------------------------------------------------------
bucket1 | couchbase | 0 | none | 0 | 3000 | 1048576000 | 67044512 | 62719186 |
bucket2 | couchbase | 0 | none | 0 | 3000 | 1048576000 | 67046608 | 93422804 |
default | couchbase | 0 | none | 0 | 500000 | 10485760000 | 157943456 | 127608381 |
--------------------------------------------------------------------------
3. Start CRUD on collections
4. Update bucket replicas to 3 and start rebalance out.
2020-09-07 15:11:37,358 | test | INFO | MainThread | [collections_rebalance:rebalance_operation:178] Updating all the bucket replicas to 3
2020-09-07 15:11:37,358 | test | INFO | MainThread | [collections_rebalance:rebalance_operation:157] Starting rebalance operation of type : rebalance_out
2020-09-07 15:11:37,950 | test | INFO | pool-23-thread-8 | [table_view:display:72] Rebalance Overview
------------------------------------
Nodes | Services | Status |
------------------------------------
172.23.98.196 | kv | Cluster node |
172.23.98.195 | kv | Cluster node |
172.23.104.186 | [u'kv'] | — OUT ---> |
172.23.120.206 | kv | Cluster node |
172.23.121.10 | [u'kv'] | — OUT ---> |
------------------------------------
5. Rebalance completes successfully. However Coredump is seen after rebalance completes.
grep of CRITICAL memcached
[user:info,2020-09-07T14:24:05.844-07:00,ns_1@172.23.121.10:<0.17939.1>:ns_log:crash_consumption_loop:69]Service 'memcached' exited with status 139. Restarting. Messages:
|
2020-09-07T14:24:05.666235-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0xba2af]
|
2020-09-07T14:24:05.666254-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0xe55e8]
|
2020-09-07T14:24:05.666273-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0xcac0b]
|
2020-09-07T14:24:05.666294-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0x1b0751]
|
2020-09-07T14:24:05.666314-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0x192d51]
|
2020-09-07T14:24:05.666334-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0x177fe3]
|
2020-09-07T14:24:05.666358-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f9fa50b5000+0x7fb3f]
|
2020-09-07T14:24:05.666372-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f9fa3b90000+0x10777]
|
2020-09-07T14:24:05.666386-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f9fa11d6000+0x7ea5]
|
2020-09-07T14:24:05.666441-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f9fa0e08000+0xfe8dd]
|
172.23.121.10 : Stack Trace of first crash: c25ef48b-2cfb-49ea-4fb7fe87-b0dc087f.dmp
(gdb) bt full
|
#0 0x00007f9fa11dfd00 in pthread_mutex_lock () from /lib64/libpthread.so.0
|
No symbol table info available.
|
#1 0x00007f9fa516f2af in __gthread_mutex_lock (__mutex=0x58) at /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748
|
No locals.
|
#2 lock (this=0x58) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:103
|
No locals.
|
#3 lock_guard (__m=..., this=<synthetic pointer>) at /usr/local/include/c++/7.3.0/bits/std_mutex.h:162
|
No locals.
|
#4 BackfillManager::wakeUpTask (this=0x0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/backfill-manager.cc:419
|
lh = {_M_device = @0x58}
|
#5 0x00007f9fa519a5e8 in DcpProducer::notifyBackfillManager (this=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/producer.cc:1399
|
No locals.
|
#6 0x00007f9fa517fc0b in DcpConnMap::notifyBackfillManagerTasks (this=<optimized out>) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/dcp/dcpconnmap.cc:465
|
producer = <optimized out>
|
handle = {<folly::LockedPtrBase<folly::Synchronized<ConnStore::CookieToConnMapHandle, folly::SharedMutexImpl<false, void, std::atomic, false, false> >, folly::SharedMutexImpl<false, void, std::atomic, false, false>, folly::LockPolicyExclusive>> = {parent_ = 0x7f9f96462000}, static AllowsConcurrentAccess = false}
|
#7 0x00007f9fa5265751 in PagingVisitor::complete (this=0x7f9f615345c0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/paging_visitor.cc:328
|
elapsed_time = <optimized out>
|
inverse = false
|
#8 0x00007f9fa5247d51 in VBCBAdaptor::run (this=0x7f9f95abb7f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/kv_bucket.cc:2382
|
id = 1024
|
#9 0x00007f9fa522cfe3 in GlobalTask::execute (this=0x7f9f95abb7f0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/globaltask.cc:73
|
guard = {previous = 0x0}
|
#10 0x00007f9fa5134b3f in CB3ExecutorThread::run (this=0x7f9f9f9f36c0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/kv_engine/engines/ep/src/cb3_executorthread.cc:174
|
curTaskDescr = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
|
_M_p = 0x7f9f968f7260 <Address 0x7f9f968f7260 out of bounds>}, _M_string_length = 30, {_M_local_buf = "\036\000\000\000\000\000\000\000pressor", _M_allocated_capacity = 30}}
|
woketime = <optimized out>
|
scheduleOverhead = <optimized out>
|
again = <optimized out>
|
runtime = <optimized out>
|
q = <optimized out>
|
tick = 101 'e'
|
guard = {engine = 0x0}
|
#11 0x00007f9fa3ba0777 in run (this=0x7f9f96ff9dd0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:58
|
No locals.
|
#12 platform_thread_wrap (arg=0x7f9f96ff9dd0) at /home/couchbase/jenkins/workspace/couchbase-server-unix/platform/src/cb_pthreads.cc:71
|
context = {_M_t = {
|
_M_t = {<std::_Tuple_impl<0, CouchbaseThread*, std::default_delete<CouchbaseThread> >> = {<std::_Tuple_impl<1, std::default_delete<CouchbaseThread> >> = {<std::_Head_base<1, std::default_delete<CouchbaseThread>, true>> = {<std::default_delete<CouchbaseThread>> = {<No data fields>}, <No data fields>}, <No data fields>}, <std::_Head_base<0, CouchbaseThread*, false>> = {_M_head_impl = 0x7f9f96ff9dd0}, <No data fields>}, <No data fields>}}}
|
#13 0x00007f9fa11ddea5 in start_thread () from /lib64/libpthread.so.0
|
No symbol table info available.
|
#14 0x00007f9fa0f068dd in clone () from /lib64/libc.so.6
|
No symbol table info available.
|
(gdb)
|
cbcollect_info attached
Attachments
Issue Links
- is duplicated by
-
MB-42275 [Collections] - Memcached minidumps seen during rebalance in - out + CRUD on collections
- Closed