Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
Description
KV should prioritise backfills for replication streams over those for other components, to avoid arbitrary delays to replication and rebalance.
With the introduction of collections, some situations (see MB-48693, MB-48532) can lead to other components (e.g., fts) creating a very large number of streams, leading to a very large number of concurrent backfills - (in MB-48693, ~70k).
KV caps the number of running backfills, with a hard limit of 4096; any additional backfills will be queued.
This means streams for e.g., FTS can contend with or delay the start of backfills for replication. This has significant impact on rebalance time, to the point that rebalance may appear "stuck".
Additionally, outside of rebalance this could also affect durability. If a replication stream is cursor dropped, it may be a significant amount of time before a subsequent backfill can be completed. During this time, the associated replica would fall behind, and if that replica is required to reach majority, sync writes for that vbucket will time out.
To avoid this, KV should prioritise backfills for replication streams/producers over those for other components. Excessive backfills would still cause additional load for KV, but would no longer be able to delay serving a replication backfill for an arbitrary amount of time.
Attachments
Issue Links
- is duplicated by
-
MB-57304 KV rebalance can hang if other DCP clients consume all backfill slots
- Closed
- relates to
-
MB-45028 BFM backfill allocation unfair when we hit running backfill limit
- Open
-
MB-57304 KV rebalance can hang if other DCP clients consume all backfill slots
- Closed
-
MB-49702 Magma rolling back to zero despite rollback seqno being within the 10min/10% window
- Closed
-
MB-35782 Delta recovery should not create replications to all vbuckets being recovered immediately
- Reopened
-
MB-48693 KV Rebalance stuck at 82%
- Closed