Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.2.2
-
Untriaged
-
0
-
Unknown
Description
A recent change (MB-57334)in the rebalance code path added a logic to early skip rebalance in case FTS topology didn't change.
We realised that there are certain situations where we don't want to skip rebalance, even if FTS topology remains unchanged.
Example:
- Missing partitions due to previous failover operations ( Addressed
MB-58450) - server group change for FTS nodes.
To avoid skipping rebalance in case server group changes for FTS node(s), we will have to track NodeDefs across rebalances.
As of now, we are only tracking NodeUUIDs across rebalances.
keeping track of NodeUUIDs/NodeDefs at the time of last successful rebalance enable us to reconcile the state of latest NodeDefs as compared to prevNodeDefs (from last rebalance). Based on which we can decide whether to skip rebalance or not.
Prev attempt to solve this: https://review.couchbase.org/c/cbgt/+/197432
summary of comments on the PR:
- On all nodes, we can track prevNodeDefs ( NodeDefs snapshot after last successful rebalance).
- After every successful rebalance, we need to update prevNodeDefs on all the nodes. Only orchestrator node knows that rebalance completed successfully, it need some mechanism to let other nodes also.
- One way to achieve this using metakv. After a successful rebalance, orchestrator can notify other nodes (via metakv) to indicate that they need to update their local copy of prevNodeDefs.
We also want to verify that the current mechanism to keep track of prevNodeUUIDs is correct.
Attachments
Issue Links
- relates to
-
MB-61043 Partition layout skew after failover(s) + rebalance; must not skip following rebalance ops in case of a skew
- Closed