DCP backfill doesn't read expected data

Description

During backup the dcp connection hangs. Below are the logs from the connection of one of the hung vbuckets.

Wed Feb 4 18:03:36.334143 CST 3: (boss_event_1) DCP (Producer) eq_dcpq:cbbackup-qeYHjBNEGfOVoQKb - (vb 99) stream created with start seqno 0 and end seqno 381517
Wed Feb 4 18:03:36.433578 CST 3: (boss_event_1) DCP (Producer) eq_dcpq:cbbackup-qeYHjBNEGfOVoQKb - (vb 99) Sending disk snapshot with start seqno 0 and end seqno 381517
Wed Feb 4 18:03:36.436566 CST 3: (boss_event_1) DCP (Producer) eq_dcpq:cbbackup-qeYHjBNEGfOVoQKb - (vb 99) Backfill complete, 0 items read from disk, last seqno read: 381507
Wed Feb 4 18:03:36.436577 CST 3: (boss_event_1) Backfill task (1 to 381517) finished for vb 99 disk seqno 381517 memory seqno 381517

As we can see the backfill task should have read up to sequence number 381517, but it only reads up to 381507. This is the root cause and this behavior is unexpected. Note that we do have a cursor in the checkpoint manager for the dcp connection, but it is not expecting to actually read any items.

vb_99:eq_dcpq:cbbackup-qeYHjBNEGfOVoQKb:cursor_checkpoint_id: 29498
vb_99:eq_dcpq:cbbackup-qeYHjBNEGfOVoQKb:cursor_seqno: 381518
vb_99:eq_dcpq:replication:ns_1@node01.domain.com->ns_1@node03.domain.com:boss_event_1:cursor_checkpoint_id: 29498
vb_99:eq_dcpq:replication:ns_1@node01.domain.com->ns_1@node03.domain.com:boss_event_1:cursor_seqno: 381517
vb_99:last_closed_checkpoint_id: 29497
vb_99:num_checkpoint_items: 1
vb_99:num_checkpoints: 1
vb_99:num_items_for_persistence: 0
vb_99:num_open_checkpoint_items: 0
vb_99:num_tap_cursors: 2
vb_99:open_checkpoint_id: 29498
vb_99:persisted_checkpoint_id: 29497
vb_99:state: active

Components

Affects versions

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

blocks

Activity

Show:

Manu Dhundi March 5, 2015 at 6:23 PM

This is part of 3.0.2 MP release.

Cihan Biyikoglu March 5, 2015 at 4:07 PM

Hi folks do we have a patch built for this? or are we asking customers to wait until 3.0.3?

Eric Cooper February 25, 2015 at 12:16 AM

I will be creating a CBQE ticket to formally add the test to the DCP test suite.

Eric Cooper February 25, 2015 at 12:15 AM

reproed the problem with 3.0.2-1619 and verified working properly with 3.0.2-1636

Chris Hillery February 24, 2015 at 9:39 AM

Took a few tries, but Centos 6 build is available now as well.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Yes

Triage

Triaged

Sprint

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created February 10, 2015 at 11:53 PM
Updated November 22, 2024 at 2:56 PM
Resolved February 24, 2015 at 5:50 PM
Instabug