Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.1.0
-
Untriaged
-
-
1
-
Unknown
Description
Build : 7.1.0-1716
Test : -test tests/integration/neo/test_neo_couchstore_milestone3.yml -scope tests/integration/neo/scope_couchstore.yml
Scale : 3
Iteration : 1st
In the longevity test, there is a new step to perform a hard failover on a query node using couchbase-cli.
This step is triggered at 2021-11-16T04:34:41 on the test client
[2021-11-16T04:34:41-08:00, sequoiatools/couchbase-cli:7.1:95cf4d] failover -c 172.23.97.74:8091 --server-failover 172.23.97.150:8091 -u Administrator -p password --hard
|
|
It returns an error :
ERROR: Request to host `http://172.23.97.74:8091/controller/failOver` timed out after 60 seconds
|
From the debug.log on 172.23.97.74, the failover started at 2021-11-16T04:34:42.049
[ns_server:debug,2021-11-16T04:34:42.049-08:00,ns_1@172.23.97.74:<0.26242.0>:failover:start:35]Starting failover with Nodes = ['ns_1@172.23.97.150'], Options = #{allow_unsafe =>
|
false,
|
auto =>
|
false}
|
|
The failover is successfully completed at 2021-11-16T04:35:42.465
[user:info,2021-11-16T04:35:42.465-08:00,ns_1@172.23.97.74:<0.26242.0>:ns_orchestrator:log_rebalance_completion:1457]Failover completed successfully.
|
Rebalance Operation Id = 2aee0e369480cde19e868b304aee1e87
|
The failover operation took slightly over 60s (416ms to be precise), hence the CLI returned the error.
So, either the CLI needs to have a larger timeout, or if it is necessary for the failover to be completed within 60s, it should be investigated why the failover took more than 60s.
There was one more occurrence of this later in the test.
Failover of query node started at 2021-11-16T04:43:50
[ns_server:debug,2021-11-16T04:43:50.147-08:00,ns_1@172.23.97.74:<0.26242.0>:failover:start:35]Starting failover with Nodes = ['ns_1@172.23.97.150'], Options = #{allow_unsafe =>
|
false,
|
auto =>
|
false}
|
Failover completed successfully at 2021-11-16T04:45:20
[user:info,2021-11-16T04:45:20.762-08:00,ns_1@172.23.97.74:<0.26242.0>:ns_orchestrator:log_rebalance_completion:1457]Failover completed successfully.
|
Rebalance Operation Id = 20cbfac842260a7d135525915ca9ce09
|
This time the failover took 90.615 secs. The CLI errored out this time too with the same error.