KV swap-rebalance is hung forever due to DcpConsumer missing wakeup to transmit StreamRequest

Description

  1. Create a 3KV, 2 GSI-N1QL node cluster.

  2. Create magma bucket and 10 collections.

  3. Create 100000000 items: CircularKey

  4. Scale OUT with Loading of docs by 1 kv node & 1 GSI node one at a time

  5. Scale OUT with Loading of docs by 1 kv node & 1 GSI node one at a time

  6. Scale IN with Loading of docs by 1 kv node & 1 GSI node one at a time

  7. Scale IN with Loading of docs by 1 kv node & 1 GSI node one at a time

  8. Scale Disk UP with Loading of docs

  9. Scale Disk DOWN with Loading of docs

  10. Scale Compute UP with Loading of docs - Vertical scaling which lead to swap rebalance of each node

  11. Scale Compute DOWN with Loading of docs - Vertical scaling which lead to swap rebalance of each node

  12. Scale Disk + Compute UP with Loading of docs - Vertical scaling which lead to swap rebalance of each node

  13. KV node swap rebalance is hung forever.

QE test

Components

Affects versions

Fix versions

Labels

Environment

7.6.0-1690

Link to Log File, atop/blg, CBCollectInfo, Core dump

kv_rebl_hung → http://supportal.couchbase.com/snapshot/bbf72f339ff5758d98368d6daf2a2afb::0

Release Notes Description

None

Attachments

2

Activity

Ritesh Agarwal February 28, 2024 at 1:34 AM

Working fine on Enterprise Edition 7.6.0 build 2164

CB robot November 3, 2023 at 12:01 AM

Build couchbase-server-8.0.0-1454 contains kv_engine commit 4e1aab4 with commit message:
: Re-notify DCP conns if timeslice exceeded & more data

CB robot November 2, 2023 at 5:44 PM

Build capella-analytics-1.0.0-1072 contains kv_engine commit 4e1aab4 with commit message:
: Re-notify DCP conns if timeslice exceeded & more data

CB robot November 2, 2023 at 5:22 PM

Build couchbase-server-7.6.0-1743 contains kv_engine commit 4e1aab4 with commit message:
: Re-notify DCP conns if timeslice exceeded & more data

Trond Norbye November 2, 2023 at 10:55 AM

That seems like the problem.. As you say it doesn't account for the timed timeslice. (We should probably look into replacing the "numEvent" timeslice with just using the clock as that's probably more fair)

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Unknown

Triage

Untriaged

Story Points

Sprint

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created October 30, 2023 at 1:13 AM
Updated February 28, 2024 at 1:35 AM
Resolved November 3, 2023 at 10:04 AM
Instabug