[BP-7.1.3] - XDCR Metakv callbacks racing when remote cluster ref is added/changed

Description

Multiple callbacks are allowed to run for metakv listener. These normally run fine as these do not take much time. But in-case the callbacks take time (e.g due to long DNS look-up time) then the order in which changes happen in metakv are applied out-of-order.

This is because the a spawned callback gets stalled and in the meantime, the metakv gets modified. This stalled callback overrides with values.

All this gets manifested with error: 

 

Error writing to metakv: revision number does not match

Components

Affects versions

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Activity

Show:

CB robot November 9, 2022 at 12:33 AM

Build couchbase-server-7.2.0-5016 contains goxdcr commit d40b52b with commit message:
: Fix racing of metakv callbacks for remote clusters

CB robot November 9, 2022 at 12:33 AM

Build couchbase-server-7.2.0-5016 contains goxdcr commit 5284a08 with commit message:
: fix to add missing unlock in error path

CB robot November 8, 2022 at 10:37 PM

Build couchbase-server-7.1.3-3473 contains goxdcr commit d40b52b with commit message:
: Fix racing of metakv callbacks for remote clusters

CB robot November 8, 2022 at 10:37 PM

Build couchbase-server-7.1.3-3473 contains goxdcr commit 5284a08 with commit message:
: fix to add missing unlock in error path

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Unknown

Triage

Untriaged

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created November 4, 2022 at 8:58 PM
Updated November 9, 2022 at 12:33 AM
Resolved November 8, 2022 at 10:35 PM
Instabug