DCP producer not backfilling data after previously reaching maxBackfills limit

Description

A couple of our regression tests started intermittently failing recently because of some missing documents that are supposed to be streamed via DCP within a certain timeout (60 seconds).

The tests use cluster_run with 3 nodes as follows:
Node n0: data + analytics
Node n1: data
Node n2: analytics

At around time 2020-06-14T19:52:40.348-07:00, the beer-sample bucket is created and loaded. After that, Analytics starts DCP streams to the bucket.
When I checked the logs, I noticed that we are receiving all the documents from one of the data nodes but none from the other.

Following the memcached logs for a vbucket from each node, we can see the following:

Node n0 (vb:1):

Node: n1 (vb:516) the backfill snapshot is never sent:

We haven't made any major changes to the DCP streaming client and that test used to pass all the time within that configured timeout of 60 seconds.

cbcollect_info from the nodes is attached.

Affects versions

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Attachments

1

Activity

Ashwin Govindarajulu June 24, 2020 at 3:02 PM

Closing based on unit_test run result.

CB robot June 23, 2020 at 4:38 PM

Build couchbase-server-7.0.0-2439 contains kv_engine commit 148e1ac with commit message:
[1/3]: Decouple BackfillManager and EvpEngine

CB robot June 23, 2020 at 4:38 PM

Build couchbase-server-7.0.0-2439 contains kv_engine commit da816fc with commit message:
[2/3]: Introduce BackfillTrackingIface

CB robot June 23, 2020 at 4:38 PM

Build couchbase-server-7.0.0-2439 contains kv_engine commit a4372ad with commit message:
[3/3]: Account for Backfills in initializingQ on destruction

CB robot June 22, 2020 at 4:07 PM

Build couchbase-server-6.6.0-7832 contains kv_engine commit da816fc with commit message:
[2/3]: Introduce BackfillTrackingIface

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Yes

Triage

Triaged

Due date

Story Points

Sprint

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created June 19, 2020 at 2:07 PM
Updated November 25, 2020 at 2:57 PM
Resolved June 22, 2020 at 4:10 PM
Instabug