Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.0.2
-
6.6.3-9808 -> 7.0.2-6668
-
Untriaged
-
Centos 64-bit
-
1
-
No
Description
Steps to Repro
1. Run the following longevity script on 6.6.3 for 5 days.
./sequoia -client 172.23.104.254:2375 -provider file:centos_second_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.3-9808 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
At this point it should have a 27 node cluster ( 9 Kv, 6 Index, 3 analytics, 3 fts, 3 eventing and 3 n1ql)
2. Create 10k metakv tombstones. This has been part of our testing since MB-44838 was fixed. We used to have a total of around 25k for CC, have reduced it here to around 12k.
#!/bin/sh
|
for i in {0..10000}
|
do
|
`curl -X PUT -u Administrator:password http://localhost:8091/_metakv/key{$i} -d 'value=foo1'`
|
`curl -X DELETE -v -u Administrator:password http://localhost:8091/_metakv/key{$i}`
|
done
|
3. Swap rebalance 6 nodes , 1 of each service with that of 7.0.2 nodes. Rebalance goes through successfully.
4. Failover 6 nodes(6.6.3 nodes)1 of each service(kv is graceful failover), Upgrade these nodes to 7.0.2, do a recovery of all the 6 node(kv is delta recovery) and rebalance.
ns_1@172.23.106.136 1:12:01 AM 13 Sep, 2021
Starting rebalance, KeepNodes = ['ns_1@172.23.106.134','ns_1@172.23.106.136',
|
'ns_1@172.23.106.137','ns_1@172.23.106.138',
|
'ns_1@172.23.120.58','ns_1@172.23.120.73',
|
'ns_1@172.23.120.74','ns_1@172.23.120.75',
|
'ns_1@172.23.120.77','ns_1@172.23.120.81',
|
'ns_1@172.23.120.86','ns_1@172.23.121.118',
|
'ns_1@172.23.121.77','ns_1@172.23.123.24',
|
'ns_1@172.23.123.25','ns_1@172.23.123.26',
|
'ns_1@172.23.123.31','ns_1@172.23.123.32',
|
'ns_1@172.23.123.33','ns_1@172.23.96.122',
|
'ns_1@172.23.96.14','ns_1@172.23.96.243',
|
'ns_1@172.23.97.105','ns_1@172.23.97.148',
|
'ns_1@172.23.97.149','ns_1@172.23.97.150',
|
'ns_1@172.23.97.151'], EjectNodes = [], Failed over and being ejected nodes = [], Delta recovery nodes = ['ns_1@172.23.96.14'], Delta recovery buckets = all; Operation Id = 8fa9cee395483fda91678362bea50af3
|
The above rebalance fails as shown in rebalance_report_20210913T082158.json. The rebalance failure is humongous which I believe is a dup of MB-46805. If it's not, we should file a new one.
cbcollect_info attached. This the first time we are running this system test upgrade on 7.0.2, hence there is no baseline as such and no last working build.