Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
6.6.2-9588 -> 7.0.0-5141
-
Untriaged
-
Centos 64-bit
-
1
-
No
Description
Scripts to Repro
1. Run the 6.6.2 longevity test for 3 days.
./sequoia -client 172.23.96.162:2375 -provider file:centos_third_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.2-9588 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
2. It had 27 nodes at the end of the test.
3. Added 6 7.0.0(172.23.105.102,172.23.105.62,172.23.106.232,172.23.106.239,172.23.106.37, 172.23.106.246) nodes and rebalanced in and removed 6 node from 6.6.2(172.23.110.75,172.23.110.76,172.23.105.61,172.23.106.191,172.23.106.209,172.23.106.70)
and rebalanced out.
4. Failed over 6 nodes and graceful failover + recovery + rebalance.
5. Now swap rebalance 6 nodes. 2 data + 2 index + 1 eventing + 1 analytics as shown below.
ns_1@172.23.105.10211:42:57 PM 11 May, 2021
Starting rebalance, KeepNodes = ['ns_1@172.23.104.15','ns_1@172.23.104.214',
|
'ns_1@172.23.104.232','ns_1@172.23.104.244',
|
'ns_1@172.23.104.245','ns_1@172.23.105.102',
|
'ns_1@172.23.105.109','ns_1@172.23.105.112',
|
'ns_1@172.23.105.118','ns_1@172.23.105.164',
|
'ns_1@172.23.105.61','ns_1@172.23.105.62',
|
'ns_1@172.23.105.90','ns_1@172.23.105.93',
|
'ns_1@172.23.106.117','ns_1@172.23.106.191',
|
'ns_1@172.23.106.207','ns_1@172.23.106.209',
|
'ns_1@172.23.106.232','ns_1@172.23.106.239',
|
'ns_1@172.23.106.246','ns_1@172.23.106.32',
|
'ns_1@172.23.106.37','ns_1@172.23.106.70',
|
'ns_1@172.23.110.75','ns_1@172.23.110.76'], EjectNodes = ['ns_1@172.23.106.54',
|
'ns_1@172.23.105.210',
|
'ns_1@172.23.105.25',
|
'ns_1@172.23.105.86',
|
'ns_1@172.23.105.206',
|
'ns_1@172.23.106.225'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 7e7071f79333e252943a2259497d743d
|
The above rebalance failed as shown below. This is related to MB-46246.
ns_1@172.23.105.10212:09:57 AM 12 May, 2021
Rebalance exited with reason {service_rebalance_failed,eventing,
|
{agent_died,<31276.23862.7>,
|
{lost_connection,
|
{'ns_1@172.23.106.70',shutdown}}}}.
|
Rebalance Operation Id = 7e7071f79333e252943a2259497d743d
|
Now I retried the failed rebalance again .
ns_1@172.23.105.10212:25:53 AM 12 May, 2021
Starting rebalance, KeepNodes = ['ns_1@172.23.104.15','ns_1@172.23.104.214',
|
'ns_1@172.23.104.232','ns_1@172.23.104.244',
|
'ns_1@172.23.104.245','ns_1@172.23.105.102',
|
'ns_1@172.23.105.109','ns_1@172.23.105.112',
|
'ns_1@172.23.105.118','ns_1@172.23.105.164',
|
'ns_1@172.23.105.61','ns_1@172.23.105.62',
|
'ns_1@172.23.105.90','ns_1@172.23.105.93',
|
'ns_1@172.23.106.117','ns_1@172.23.106.191',
|
'ns_1@172.23.106.207','ns_1@172.23.106.209',
|
'ns_1@172.23.106.232','ns_1@172.23.106.239',
|
'ns_1@172.23.106.246','ns_1@172.23.106.32',
|
'ns_1@172.23.106.37','ns_1@172.23.106.70',
|
'ns_1@172.23.110.75','ns_1@172.23.110.76'], EjectNodes = ['ns_1@172.23.106.54',
|
'ns_1@172.23.105.210',
|
'ns_1@172.23.105.25',
|
'ns_1@172.23.105.86',
|
'ns_1@172.23.105.206',
|
'ns_1@172.23.106.225'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = e5d19839baa473b6d0c1155448d81eeb
|
This rebalance hung at indexing service for well over 6+ hours. It got stuck at 53.69318181818181 %.
172.23.105.102
[root@localhost logs]# date ;/opt/couchbase/bin/couchbase-cli rebalance-status -c 172.23.105.102 --username Administrator --password password
|
Wed May 12 06:56:54 PDT 2021
|
{
|
"status": "running",
|
"msg": "Rebalance is running",
|
"details": {
|
"progress": 53.69318181818181,
|
"refresh": 0.25,
|
"totalBuckets": 10,
|
"curBucket": 10,
|
"curBucketName": "default",
|
"docsRemaining": 0
|
}
|
}
|
[root@localhost logs]#
|
Cbcollect_info attached.
Attachments
Issue Links
- relates to
-
MB-46301 [Upgrade] - Online upgrade using failover + recovery + rebalance hangs in indexing rebalance
- Closed