Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
7.2.0
-
7.1.4-3601 -> 7.2.0-5263
-
Untriaged
-
Centos 64-bit
-
0
-
No
Description
Steps to Repro
1. Run neo longevity test for 3 days on 7.1.4-3601.
./sequoia -client 172.23.104.254:2375 -provider file:centos_third_cluster.yml -test tests/integration/neo/test_neo.yml -scope tests/integration/neo/scope_neo_magma.yml -scale 3 -repeat 0 -log_level 0 -version 7.1.4-3601 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
2. Upgrade this cluster to 7.2.0-5263 using a mix of online and offline upgrade strategies.
3. Updated the bucket properties of the all the buckets post upgrade using the following rest api.
[root@s20507w12r2 ~]# curl localhost:8091/pools/default/buckets/default -u Administrator:password -X POST -d historyRetentionBytes=2147483648
|
4. Tried to do swap rebalance and it keeps failing continuously as shown below.
172.23.108.148 3:24:43 AM 24 Mar, 2023
Starting rebalance, KeepNodes = ['ns_1@172.23.104.176','ns_1@172.23.105.0',
|
'ns_1@172.23.105.38','ns_1@172.23.105.39',
|
'ns_1@172.23.105.91','ns_1@172.23.106.32',
|
'ns_1@172.23.106.37','ns_1@172.23.106.54',
|
'ns_1@172.23.107.142','ns_1@172.23.107.236',
|
'ns_1@172.23.107.25','ns_1@172.23.108.129',
|
'ns_1@172.23.108.132','ns_1@172.23.108.134',
|
'ns_1@172.23.108.136','ns_1@172.23.108.138',
|
'ns_1@172.23.108.139','ns_1@172.23.108.140',
|
'ns_1@172.23.108.141','ns_1@172.23.108.143',
|
'ns_1@172.23.108.144','ns_1@172.23.108.145',
|
'ns_1@172.23.108.146','ns_1@172.23.108.34',
|
'ns_1@172.23.108.61','ns_1@172.23.97.176'], EjectNodes = ['ns_1@172.23.104.216',
|
'ns_1@172.23.105.134',
|
'ns_1@172.23.104.249',
|
'ns_1@172.23.108.148',
|
'ns_1@172.23.105.210'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 25cf49fe6e25e67940dfe0f5fa01d38f
|
172.23.108.148 3:25:02 AM 24 Mar, 2023
Worker <0.21332.3277> (for action {move,
|
{1014,
|
['ns_1@172.23.108.148',
|
'ns_1@172.23.104.176'],
|
['ns_1@172.23.107.236',
|
'ns_1@172.23.104.176'],
|
[]}}) exited with reason {unexpected_exit,
|
{'EXIT',
|
<0.13246.3277>,
|
{{dcp_wait_for_data_move_failed,
|
"ITEM",
|
1014,
|
'ns_1@172.23.108.148',
|
['ns_1@172.23.107.236',
|
'ns_1@172.23.104.176'],
|
{error,
|
no_stats_for_this_vbucket}},
|
[{ns_single_vbucket_mover,
|
'-wait_dcp_data_move/5-fun-0-',
|
5,
|
[{file,
|
"src/ns_single_vbucket_mover.erl"},
|
{line,
|
451}]},
|
{proc_lib,
|
init_p,3,
|
[{file,
|
"proc_lib.erl"},
|
{line,
|
211}]}]}}}
|
172.23.108.148 3:25:02 AM 24 Mar, 2023
Rebalance exited with reason {mover_crashed,
|
{unexpected_exit,
|
{'EXIT',<0.13246.3277>,
|
{{dcp_wait_for_data_move_failed,"ITEM",1014,
|
'ns_1@172.23.108.148',
|
['ns_1@172.23.107.236',
|
'ns_1@172.23.104.176'],
|
{error,no_stats_for_this_vbucket}},
|
[{ns_single_vbucket_mover,
|
'-wait_dcp_data_move/5-fun-0-',5,
|
[{file,"src/ns_single_vbucket_mover.erl"},
|
{line,451}]},
|
{proc_lib,init_p,3,
|
[{file,"proc_lib.erl"},{line,211}]}]}}}}.
|
Rebalance Operation Id = 25cf49fe6e25e67940dfe0f5fa01d38f
|
Retried failed rebalances around 10 times. It keeps failing.
cbcollect_info attached. This is the first time we are doing system test upgrade from 7.1.4 -> 7.2.0