Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0, 7.2.0
-
0
-
No
Description
Enabling the replica to catchup to expected seq no. (specifically the sourceSeq no.) would help in keeping the system in a more consistent state following a rebalance operation.
This will help in making sure that in the subsequent rebalance operations won't experience incorrect behaviours such rollbacks due to partially built partitions. A summary of the situation from the CBSE-16450:
- During the rebalance operations, some of the partitions' movement failed file transfer (due to an operational race with the merge operations at the time) causing the partitions to fallback to getting rebuilt from scratch from KV causing a large backfill - this caused slowness in partitions becoming completely available at the new location in the cluster.
- During the following failover + rebalance operations, rollbacks occurred - on new replicas which were being built using the partitions being reconstructed (from (2)) as a reference (file transfer) and due to the requested sequence number post file transfer trailing the purged sequence number on KV - therefore causing double the IOPS to rebuild them.
Attachments
Issue Links
- relates to
-
DOC-12135 Document MB-61310 FTS rebalance must wait for seq no. catchup on replicas
- In Progress