[BP 7.2.5] - XDCR - Negative changes_left for a paused replication when goxdcr is killed and respawned

Description

Consider the 2 types of kv_vb_map in use to calculate the stats for a paused replication in UpdateStats(...):

A. cur_kv_vb_map, calculated as:

B. sourceVBMap, calculated as:

Say there are N KV nodes in the source cluster and for the sake for simplicity, let's say that all the N nodes have T total_docs each and each have processed (docs_processed) P docs.

The difference between the maps is that:

(A) contains all the N nodes and stats calculated using this will be stats aggregated across the cluster level

AND

(B) contains only 1 node (the current node) in its map i.e. the stats calculated using this will be the stats for itself only.

And when we hit the following codepath, we use (A)

constructStatsForReplication calculates the following:

total_docs: highSeqNo (gotten from KV) for all the nodes in (B) = 1*T. Example, For a 3 KV node setup, because of this bug we get:

2. docs_processed: seqno from checkpoints of all the VBs of nodes in (A) = N*P

3. changes_left = total_docs - docs_processed = 1*T - N*P which potentially will go negative.

This has to be fixed for all of this path to use (B), so that we are also consistent and calculate each overview stat for that node only.

Additionally, in this same code path we always read the overview_stats from the main pipeline, but endup storing it in backfill pipline's stats store sometimes:

This may need revisiting as well.

Components

Affects versions

Fix versions

7.2.5

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Linked issues

is a backport of

MB-60448

XDCR - Negative changes_left for a paused replication when goxdcr is killed and respawned

Activity

Show:

Sumukh Bhat June 4, 2024 at 4:11 AM

Release notes:
When goxdcr process is killed in a paused replication state, changes_left will not be negative anymore.

Beth Favini June 3, 2024 at 5:46 PM

We are preparing the 7.2.5 release notes. If this fix is customer-facing, please add the releasenote label to the issue so it will be picked up by our filter.

Ayush Nayyar March 31, 2024 at 6:44 AM

Verified on 7.2.5-7571.

CB robot February 2, 2024 at 10:09 AM

Build couchbase-server-7.2.5-7532 contains goxdcr commit 592c5af with commit message:
: Use the node's kvVbMap to calculate docs_processed when the process restarts and we have a paused replication

Fixed

Pinned fields

Click on the next to a field label to start pinning.

Details
Assignee
Ayush Nayyar
Reporter
Sumukh Bhat
Is this a Regression?
Unknown
Triage
Untriaged
Story Points
0
Priority
Blocker
Instabug
Open Instabug

PagerDuty

Sentry

Zendesk Support

Created January 31, 2024 at 6:16 PM

Updated September 17, 2024 at 4:49 PM

Resolved February 2, 2024 at 6:03 AM

Configure

Instabug

[BP 7.2.5] - XDCR - Negative changes_left for a paused replication when goxdcr is killed and respawned

Description

Components

Affects versions

Fix versions

Labels

Environment

Link to Log File, atop/blg, CBCollectInfo, Core dump

Release Notes Description

Linked issues

is a backport of

Activity

Sumukh Bhat June 4, 2024 at 4:11 AM

Beth Favini June 3, 2024 at 5:46 PM

Ayush Nayyar March 31, 2024 at 6:44 AM

CB robot February 2, 2024 at 10:09 AM

DetailsAssigneeAyush NayyarAyush NayyarReporterSumukh BhatSumukh BhatIs this a Regression?UnknownTriageUntriagedStory Points0PriorityBlockerInstabugOpen Instabug

Details

Assignee

Reporter

Is this a Regression?

Triage

Story Points

Priority

Instabug

PagerDutyPagerDuty Incident

PagerDuty

Sentry Linked Issues

Sentry

Zendesk SupportLinked Tickets

Zendesk Support

Details
Assignee
Ayush Nayyar
Reporter
Sumukh Bhat
Is this a Regression?
Unknown
Triage
Untriaged
Story Points
0
Priority
Blocker
Instabug
Open Instabug

PagerDuty

Sentry

Zendesk Support