Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
6.6.2-9588 -> 7.0.0-5226
-
Untriaged
-
Centos 64-bit
-
1
-
Yes
Description
Steps to Repro
1. Run the following longevity on 6.6.2 for 3-4 days
./sequoia -client 172.23.96.162:2375 -provider file:centos_third_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.2-9588 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
2. We have 27 node cluster in 6.6.2
3. Add 6 nodes(1 of each service - 7.0.0-5226) and remove 6 nodes(6.6.2) and do a swap rebalance to upgrade the cluster.
4. Failover 6 node(1 of each service - 6.6.2), upgrade, do a recovery and rebalance.
5. Tried to continue those steps for the rest of the nodes in the cluster, but one of the rebalances failed as shown below.
ns_1@172.23.106.70 7:18:13 AM 26 May, 2021
Starting rebalance, KeepNodes = ['ns_1@172.23.104.15','ns_1@172.23.104.214',
|
'ns_1@172.23.104.232','ns_1@172.23.104.244',
|
'ns_1@172.23.104.245','ns_1@172.23.105.102',
|
'ns_1@172.23.105.109','ns_1@172.23.105.112',
|
'ns_1@172.23.105.118','ns_1@172.23.105.206',
|
'ns_1@172.23.105.210','ns_1@172.23.105.25',
|
'ns_1@172.23.105.29','ns_1@172.23.105.61',
|
'ns_1@172.23.105.86','ns_1@172.23.105.90',
|
'ns_1@172.23.106.117','ns_1@172.23.106.191',
|
'ns_1@172.23.106.207','ns_1@172.23.106.225',
|
'ns_1@172.23.106.232','ns_1@172.23.106.239',
|
'ns_1@172.23.106.246','ns_1@172.23.106.37',
|
'ns_1@172.23.106.54','ns_1@172.23.106.70',
|
'ns_1@172.23.110.75'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 57cca96fe563d50d27549ba664c85dfe
|
ns_1@172.23.106.70 7:53:28 AM 26 May, 2021
Rebalance exited with reason {service_rebalance_failed,eventing,
|
{worker_died,
|
{'EXIT',<0.15454.774>,
|
{rebalance_failed,
|
{service_error,
|
<<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}.
|
Rebalance Operation Id = 57cca96fe563d50d27549ba664c85dfe
|
attaching cbcollect in some time.
This was not seen on upgrade from 6.6.2-9588 -> 7.0.0-5141.
Attachments
Issue Links
- is cloned by
-
MB-46887 [BP of MB-46564] [System test]Online upgrade using graceful failover + full recovery + rebalance fails in eventing with "service_rebalance_failed,eventing, {worker_died,"
- Closed