Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
7.6.0
-
Enterprise Edition 7.6.0 build 18141
-
Untriaged
-
Linux x86_64
-
-
0
-
Unknown
Description
QE Test
./sequoia -client 172.23.104.254:2375 -provider file:centos_third_cluster.yml -test tests/fts/cheshire-cat/test_fts_clusterops_coll_crud_magma.yml -scope tests/fts/cheshire-cat/scope_fts_magma.yml -scale 3 -repeat 0 -log_level 0 -version 7.6.0-18141 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true |
Day - 1
Cycle - 1
Scale - 3
Test Step
Rebalance out single FTS node from the cluster.
2023-12-08T07:44:56.547-08:00, ns_orchestrator:0:info:message(ns_1@172.23.107.25) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.216','ns_1@172.23.107.236', |
'ns_1@172.23.107.25','ns_1@172.23.108.134', |
'ns_1@172.23.108.136','ns_1@172.23.108.138', |
'ns_1@172.23.108.139','ns_1@172.23.108.141', |
'ns_1@172.23.108.143','ns_1@172.23.108.146', |
'ns_1@172.23.108.148'], EjectNodes = ['ns_1@172.23.108.145'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 02c25254d0cddf2d5aac8b7a1dec8922 |
Observation
From output of pools/default/rebalanceProgress endpoint we can see that FTS rebalance is marked as completed for all the FTS nodes present in the cluster.
curl -u Administrator:password http://172.23.108.139:8091/pools/default/rebalanceProgress | jq |
{
|
"status": "running", |
"ns_1@172.23.108.141": { |
"progress": 1 |
},
|
"ns_1@172.23.108.143": { |
"progress": 1 |
},
|
"ns_1@172.23.108.134": { |
"progress": 1 |
},
|
"ns_1@172.23.108.145": { |
"progress": 1 |
},
|
"ns_1@172.23.107.25": { |
"progress": 0 |
},
|
"ns_1@172.23.108.136": { |
"progress": 1 |
},
|
"ns_1@172.23.104.216": { |
"progress": 1 |
},
|
"ns_1@172.23.108.146": { |
"progress": 1 |
},
|
"ns_1@172.23.107.236": { |
"progress": 0 |
},
|
"ns_1@172.23.108.148": { |
"progress": 1 |
},
|
"ns_1@172.23.108.138": { |
"progress": 1 |
},
|
"ns_1@172.23.108.139": { |
"progress": 1 |
}
|
}
|
Output of pools/default/tasks shows total progress for FTS rebalance as 100% but completedTime field is not populated which indicates rebalance is still ongoing.
"search": { |
"totalProgress": 100, |
"perNodeProgress": { |
"ns_1@172.23.108.143": 1, |
"ns_1@172.23.108.145": 1, |
"ns_1@172.23.108.136": 1, |
"ns_1@172.23.104.216": 1, |
"ns_1@172.23.108.148": 1, |
"ns_1@172.23.108.138": 1 |
},
|
"startTime": "2023-12-08T07:45:02.023-08:00", |
"completedTime": false, |
"timeTaken": 136409396 |
}
|
From UI we can see that FTS rebalance is still ongoing.
Screenshot 2023-12-10 at 11.19.27 AM.png
From fts.log file on 172.23.108.148 (rebalance orchestrator node) we can see that rebalance progress has been at 100% for a decent amount of time.
grep "progress: 1." ns_server.fts.log | head -5 |
2023-12-08T23:33:15.728-08:00 [INFO] ctl/manager: revNum: 88793, progress: 1.000000 |
2023-12-08T23:33:16.363-08:00 [INFO] ctl/manager: revNum: 88795, progress: 1.000000 |
2023-12-08T23:33:26.429-08:00 [INFO] ctl/manager: revNum: 88797, progress: 1.000000 |
2023-12-08T23:33:35.738-08:00 [INFO] ctl/manager: revNum: 88799, progress: 1.000000 |
2023-12-08T23:33:36.917-08:00 [INFO] ctl/manager: revNum: 88801, progress: 1.000000 |
grep "progress: 1." ns_server.fts.log | tail -5 |
2023-12-10T00:43:52.306-08:00 [INFO] ctl/manager: revNum: 115949, progress: 1.000000 |
2023-12-10T00:43:56.614-08:00 [INFO] ctl/manager: revNum: 115951, progress: 1.000000 |
2023-12-10T00:44:06.111-08:00 [INFO] ctl/manager: revNum: 115953, progress: 1.000000 |
2023-12-10T00:44:12.454-08:00 [INFO] ctl/manager: revNum: 115955, progress: 1.000000 |
2023-12-10T00:44:16.318-08:00 [INFO] ctl/manager: revNum: 115957, progress: 1.000000 |
Note
We are running this test on toy build containing vector search changes. Upcoming runs, we will trigger on normal 7.6 builds now that code is available on mainstream trinity builds.
Search Nodes
- 172.23.104.216
- 172.23.108.136
- 172.23.108.138
- 172.23.108.143
- 172.23.108.145
- 172.23.108.148