Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.6.2
-
Enterprise Edition 7.6.2 build 3619
-
Untriaged
-
-
0
-
Unknown
Description
Steps
1 . create a 6 node cluster
172.23.136.104 - data
172.23.136.106 - data
172.23.136.109 - data
172.23.136.110 - query
172.23.136.114 - index
172.23.136.115 - data
2. hit the api that delays autofailover by 1 min
curl -k https://Administrator:password@localhost:18091/diag/eval -X POST -d 'testconditions:set(failover_start, {delay,60000 })'
|
3. set Auto-failover timeout - 60 and max nodes - 2
4. bring down .104 , autofailover starts for .104
user:info,2024-05-13T22:48:31.455-07:00,ns_1@172.23.136.110:<0.17382.6>:failover:orchestrate:172]Starting failing over ['ns_1@172.23.136.104']
|
5. as autofailover is delayed , bring down second node .109 in around middle of ongoing
failover after around ~30 seconds passed
[user:warn,2024-05-13T22:49:02.053-07:00,ns_1@172.23.136.106:ns_node_disco<0.7214.0>:ns_node_disco:handle_info:169]Node 'ns_1@172.23.136.106' saw that node 'ns_1@172.23.136.109' went down. Details: [{nodedown_reason,
|
shutdown}]
|
6. autofailover for .104 fails as it might not be able to active replicas on .109
user:error,2024-05-13T22:49:31.524-07:00,ns_1@172.23.136.110:<0.8883.0>:ns_orchestrator:log_rebalance_completion:1661]Failover exited with reason {failover_failed,"gamesim-sample",
|
"Failed to get failover info for bucket \"gamesim-sample\": ['ns_1@172.23.136.109']"}.
|
Rebalance Operation Id = 14aa0cd61ddc898532fcb445e44e14fc
|
now expected next failover of .104 and .109 in around 30 seconds as timeout is set for 60 seconds and ticks for .109 must have been going on while AFO is delayed but next failover took around 60 more seconds.
[user:info,2024-05-13T22:50:33.094-07:00,ns_1@172.23.136.110:<0.25124.6>:failover:orchestrate:184]Failed over ['ns_1@172.23.136.104','ns_1@172.23.136.109']: ok
|
[ns_server:info,2024-05-13T22:50:33.095-07:00,ns_1@172.23.136.110:leader_quorum_nodes_manager<0.8852.0>:leader_quorum_nodes_manager:handle_set_quorum_nodes:121]Updating quorum nodes.
|
Old quorum nodes: ['ns_1@172.23.136.110','ns_1@172.23.136.104',
|
'ns_1@172.23.136.114','ns_1@172.23.136.115',
|
'ns_1@172.23.136.106','ns_1@172.23.136.109']
|
New quorum nodes: ['ns_1@172.23.136.110','ns_1@172.23.136.114',
|
'ns_1@172.23.136.115','ns_1@172.23.136.106']
|
[ns_server:error,2024-05-13T22:50:33.105-07:00,ns_1@172.23.136.110:leader_quorum_nodes_manager<0.8852.0>:ns_config_rep:synchronize_remote:356]Failed to synchronize config to some nodes:
|
[{'ns_1@172.23.136.109',
|
{exit,
|
Attachments
For Gerrit Dashboard: MB-61881 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
209968,2 | MB-61881: Add failover_end testcondition | trinity | ns_server | Status: MERGED | +2 | +1 |
210397,1 | Merge branch 'trinity' | master | ns_server | Status: MERGED | +2 | +1 |