[BP 7.0.2] - XDCR - backfill_request_handler could hang forever

Description

Backfill request handler's cooldown mechanism is flawed such that it could cause the run() routine to run the persist case when there is no operations to persist.

This can lead to the handler stuck forever waiting for an operation that will never come, and all backfill operations will be unresponsive... such as handling VB done events, or raising future backfills, etc

 

A typical symptom would be a backfill pipeline that hangs and doesn't go away (potentially with changes_left staying at 0)

The stack trace would show a bunch of go-routines doing HandleVBTaskDone(), (each one per VB), and one go routine stuck at this location:

https://github.com/couchbase/goxdcr/blob/26d8add3a1c760f1c0c99569a4582e7b7c09c689/backfill_manager/backfill_request_handler.go#L296

Components

Affects versions

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Activity

CB robot August 9, 2021 at 8:42 PM

Build couchbase-server-7.0.1-5995 contains goxdcr commit a0f3465 with commit message:
- backfill request handler could hang due to incorrectly implemented persist timer

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

No

Triage

Untriaged

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created August 6, 2021 at 12:23 AM
Updated October 14, 2021 at 1:28 PM
Resolved August 9, 2021 at 8:44 PM
Instabug