Details
-
Task
-
Resolution: Done
-
Major
-
None
-
Cheshire-Cat
-
1
Description
In some of the rebalance out tests for fts, the time take is around 14 seconds. The tests have 3 fts nodes, 1 index with 6 partitions and a single index replica. It appears that rebalance out is just activating the replicas on the remaining nodes and then completing. However, some of the replica partitions are no longer present. Should rebalance out wait until the replicas are rebuilt before saying its completed?
Example:
kv - data
fts1 - partition 1 and 2, replica partition 3 and 4
fts2 - partition 3 and 4, replica partition 5 and 6
fts3 - partition 5 and 6, replica partition 1 and 2
rebalancing out fts3 would activate replica partitions 5 and 6 on fts2 and report rebalance as completed. Even though all index partitions are available, the replica for partitions 1 and 2 are not yet rebuilt. Should rebalance only report completed when the index replicas are fully rebuilt? It seems one could have fts3 rebalanced out, then fts 2 fails immediately after, and the index would now be missing partitions.
Maybe that is happening. Here are the logs... not sure how to tell if replica has been rebuilt, just seems very fast compared to other rebalance out tests:
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-atlas_rebalance-244/172.23.99.211.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-atlas_rebalance-244/172.23.99.38.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-atlas_rebalance-244/172.23.99.39.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-atlas_rebalance-244/172.23.99.40.zip
Attachments
Issue Links
- relates to
-
DOC-8473 [FTS] Rebalance operation involving FTS nodes does not wait for indexes' replicas partitions to be built completely
- Closed