[BP to 7.1.4]- Rebalance is hung on a dataplane since >1 hour.
Description
Components
Affects versions
Fix versions
Labels
Environment
Link to Log File, atop/blg, CBCollectInfo, Core dump
Release Notes Description
Activity

Varun Velamuri February 10, 2023 at 12:17 PM
Verified this issue based on the comments mentioned in: https://couchbasecloud.atlassian.net/browse/MB-54347?focusedCommentId=843314&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel
Closing this issue from dev

CB robot January 27, 2023 at 3:24 PM
Build couchbase-server-7.1.4-3570 contains indexing commit cc9df15 with commit message:
Notify flush observer before cleaning up keyspace

Varun Velamuri January 27, 2023 at 10:59 AM
Using the steps mentioned in https://couchbasecloud.atlassian.net/browse/MB-54328?focusedCommentId=838321&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel to reproduce the issue, created 3 index instances on a bucket. Dropped one index instance while flush is paused & deleted the bucket
Before the fix: handleKeyspaceNotFound skipped cleaning up indexes
StorageSnapDone has cleaned up all index instances
This lead to lifecycle manager getting stuck - Incoming channels started to queue up requests
After the fix:
StorageSnapDone got called
but the index which got dropped was skipped
No incomings seen in lifecycle manager's channels
Details
Assignee
Hemant RajputHemant RajputReporter
Varun VelamuriVarun VelamuriIs this a Regression?
UnknownTriage
UntriagedStory Points
1Priority
CriticalInstabug
Open Instabug
Details
Details
Assignee

Reporter

Is this a Regression?
Triage
Story Points
Priority
Instabug
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

Sentry
Linked Issues
Sentry
Linked Issues
Sentry
Zendesk Support
Linked Tickets
Zendesk Support
Linked Tickets
Zendesk Support

It looks like many operation triggered at the same time:
Sample indexes are being build for a database and before it gets completed that database is deleted.
Within the same timeframe, a new bucket is created and its width is change to 2 which triggered a rebalance which is in hung state.
! new bucket creation request came on CP which is redirected to this dataplane.