Rebalance => The Search Service maintains a cluster-wide set of index definitions and metadata, which allows the redistribution of indexes and index replicas during a rebalance. During a rebalance operation, the search service redistributes the index partitions across the available search service nodes for a balanced partition-node assignment.
The newly assigned index partitions are built afresh over DCP feed on the new nodes. And once the new partitions are build up to the current / latest sequence numbers, then they are promoted to take the live traffic and the older partitions are deleted from the system. The live traffic is never functionally affected. Nevertheless, the performance impacts of a concurrent rebalance on a live cluster can't be fully ruled out.
As the rebalance is a resource-consuming cluster management operation, it's always recommended to perform rebalance during off-peak hours.
How to speed up the rebalance/Tips for faster rebalance =>
Search service moves or builds the index partitions one at a time per node during the rebalance operations. This could significantly increase the overall time taken for the rebalance operation.
One way to speed up the rebalance operation is to enable the movement of partitions parallelly in a configurable way.
There is a configurable option [maxConcurrentPartitionMovesPerNode] to bring the additional concurrency to the way we move/build partitions during a rebalance operation.
If we override this parameter (maxConcurrentPartitionMovesPerNode to N) as a runtime cluster option then we could concurrently build that many partitions in parallel per node at a time and the rebalance ought to complete faster.
How to configure this `maxConcurrentPartitionMovesPerNode` in a cluster in CC?
Use the update endpoint for manager options.
curl -XPUT -uAdministrator:asdasd http://<nodeIP>:8094/api/managerOptions -d ' {| |"maxConcurrentPartitionMovesPerNode":"5"}' |
How to check the current value for `maxConcurrentPartitionMovesPerNode` in a cluster?
curl -XGET -uAdministrator:asdasd http://<nodeIP>:8094/api/manager |
Please keep in mind that, when multiple partitions are built in parallel, it needs more RAM and hence mandates a higher RAM quota.
As the rebalance operations consumes resources, it is always advisable to plan the rebalance operations during non-peak hours.
Failovers => During failover of Search service nodes, there is no partition movement and hence the failover-rebalance is instantaneous. Search service promotes the replica index partitions to primary so that those serve the live cluster traffic instantly.
Failover and recovery rebalance could be used by users for applying patches or upgrading software/hardware for a shorter duration. During a strict recovery rebalance operation (no extra node additions/removals), the index partitions residing on the recovered node would be reused. And this ensures a quick recovery rebalance operation.
So the usual failover-recovery steps would be like,
- Failover the node which needs quick software or hardware maintenance.
- With replica partitions, live traffic is served seamlessly.
- Perform the software/hardware maintenance operation.
- Perform recover rebalance operation.
- Cluster is back to normal/pre-failover safe state.
Rebalance => The Search Service maintains a cluster-wide set of index definitions and metadata, which allows the redistribution of indexes and index replicas during a rebalance. During a rebalance operation, the search service redistributes the index partitions across the available search service nodes for a balanced partition-node assignment.
The newly assigned index partitions are built afresh over DCP feed on the new nodes. And once the new partitions are build up to the current / latest sequence numbers, then they are promoted to take the live traffic and the older partitions are deleted from the system. The live traffic is never functionally affected. Nevertheless, the performance impacts of a concurrent rebalance on a live cluster can't be fully ruled out.
As the rebalance is a resource-consuming cluster management operation, it's always recommended to perform rebalance during off-peak hours.
How to speed up the rebalance/Tips for faster rebalance =>
Search service moves or builds the index partitions one at a time per node during the rebalance operations. This could significantly increase the overall time taken for the rebalance operation.
One way to speed up the rebalance operation is to enable the movement of partitions parallelly in a configurable way.
There is a configurable option [maxConcurrentPartitionMovesPerNode] to bring the additional concurrency to the way we move/build partitions during a rebalance operation.
If we override this parameter (maxConcurrentPartitionMovesPerNode to N) as a runtime cluster option then we could concurrently build that many partitions in parallel per node at a time and the rebalance ought to complete faster.
How to configure this `maxConcurrentPartitionMovesPerNode` in a cluster in CC?
Use the update endpoint for manager options.
How to check the current value for `maxConcurrentPartitionMovesPerNode` in a cluster?
Please keep in mind that, when multiple partitions are built in parallel, it needs more RAM and hence mandates a higher RAM quota.
As the rebalance operations consumes resources, it is always advisable to plan the rebalance operations during non-peak hours.
Failovers => During failover of Search service nodes, there is no partition movement and hence the failover-rebalance is instantaneous. Search service promotes the replica index partitions to primary so that those serve the live cluster traffic instantly.
Failover and recovery rebalance could be used by users for applying patches or upgrading software/hardware for a shorter duration. During a strict recovery rebalance operation (no extra node additions/removals), the index partitions residing on the recovered node would be reused. And this ensures a quick recovery rebalance operation.
So the usual failover-recovery steps would be like,