[BP 7.2.5] - XDCR - Negative changes_left for a paused replication when goxdcr is killed and respawned

Description

Consider the 2 types of kv_vb_map in use to calculate the stats for a paused replication in UpdateStats(...):

A. cur_kv_vb_map, calculated as: 

B. sourceVBMap, calculated as: 

Say there are N KV nodes in the source cluster and for the sake for simplicity, let's say that all the N nodes have T total_docs each and each have processed (docs_processed) P docs.

The difference between the maps is that:

(A) contains all the N nodes and stats calculated using this will be stats aggregated across the cluster level

AND

(B) contains only 1 node (the current node) in its map i.e. the stats calculated using this will be the stats for itself only.

And when we hit the following codepath, we use (A)

constructStatsForReplication calculates the following:

  1. total_docs: highSeqNo (gotten from KV) for all the nodes in (B) = 1*T. Example, For a 3 KV node setup, because of this bug we get: 

      2. docs_processed: seqno from checkpoints of all the VBs of nodes in (A) = N*P

      3. changes_left = total_docs - docs_processed = 1*T - N*P which potentially will go negative.

This has to be fixed for all of this path to use (B), so that we are also consistent and calculate each overview stat for that node only.


Additionally, in this same code path we always read the overview_stats from the main pipeline, but endup storing it in backfill pipline's stats store sometimes:

This may need revisiting as well.

Components

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Activity

Show:

Sumukh Bhat June 4, 2024 at 4:11 AM

Release notes:
When goxdcr process is killed in a paused replication state, changes_left will not be negative anymore.

Beth Favini June 3, 2024 at 5:46 PM

We are preparing the 7.2.5 release notes. If this fix is customer-facing, please add the releasenote label to the issue so it will be picked up by our filter.

Ayush Nayyar March 31, 2024 at 6:44 AM

Verified on 7.2.5-7571.

CB robot February 2, 2024 at 10:09 AM

Build couchbase-server-7.2.5-7532 contains goxdcr commit 592c5af with commit message:
: Use the node's kvVbMap to calculate docs_processed when the process restarts and we have a paused replication

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Unknown

Triage

Untriaged

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created January 31, 2024 at 6:16 PM
Updated September 17, 2024 at 4:49 PM
Resolved February 2, 2024 at 6:03 AM
Instabug