KV rebalance can hang if other DCP clients consume all backfill slots

Description

As seen in rebalance test described in , a cluster with KV plus other services (FTS and GSI in the above instance) performing a (KV) rebalance can hang if KV requires backfills and all of the available backfill slots (default 4096) are consumed by other services.

For example, it was observed that 4231 streams were attempting to backfill:

Which were made up of:

However given there's only 4096 possible at once, a number of stream were pending (waiting for a slot to become available before they can start):

While we can also look at reducing the number of concurrent streams other services create, Ideally we want a solution such that KV is "defensive" - irrespective of what other services request, it can always make rebalance progress.

 

Issue

Resolution

Data Service rebalance duration was significantly impacted if other DCP clients created a large number of Streams, if those streams needed to be read from disk, due to the lack of prioritizing between rebalance and other DCP clients.

The number of backfills each DCP client can perform concurrently has been limited to allow fairer allocation of resources.

Components

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Attachments

1

Activity

Show:

CB robot September 20, 2023 at 5:45 AM

Build capella-analytics-1.0.0-1025 contains kv_engine commit bf224e9 with commit message:
: Increase DCP backfills per conn to 64

CB robot September 19, 2023 at 10:43 AM

Build couchbase-server-8.0.0-1410 contains kv_engine commit bf224e9 with commit message:
: Increase DCP backfills per conn to 64

CB robot September 19, 2023 at 10:32 AM

Build couchbase-server-7.6.0-1521 contains kv_engine commit bf224e9 with commit message:
: Increase DCP backfills per conn to 64

CB robot September 1, 2023 at 1:29 PM

Build couchbase-server-8.0.0-1392 contains kv_engine commit 77167f6 with commit message:
: Remove leftover debug from dcpdrain

CB robot September 1, 2023 at 8:37 AM

Build couchbase-server-7.6.0-1446 contains kv_engine commit 77167f6 with commit message:
: Remove leftover debug from dcpdrain

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

No

Triage

Untriaged

Issue Impact

external

Story Points

Sprint

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created June 7, 2023 at 3:56 PM
Updated March 21, 2025 at 2:50 AM
Resolved June 15, 2023 at 2:41 PM
Instabug