Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 4.6.3, 5.0.0
Affects Version/s: 3.1.6, 4.6.0
Component/s: couchbase-bucket
Labels:
None

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
Source cluster:
http://supportal.couchbase.com/snapshot/b820146417122b6ca9712c6b7eec86c2%3A%3A1
s3://cb-customers-secure/merrickxdcrsource/collectinfo-2017-05-20t002029-ns_1@172.23.98.210.zip
s3://cb-customers-secure/merrickxdcrsource/collectinfo-2017-05-20t002029-ns_1@172.23.98.216.zip
s3://cb-customers-secure/merrickxdcrsource/collectinfo-2017-05-20t002029-ns_1@172.23.98.217.zip
Target cluster:
http://supportal.couchbase.com/snapshot/ae7af46c6397f51e563d05fad1a0684b%3A%3A0
s3://cb-customers-secure/merrickxdcrtarget/collectinfo-2017-05-20t002042-ns_1@172.23.98.223.zip
s3://cb-customers-secure/merrickxdcrtarget/collectinfo-2017-05-20t002042-ns_1@172.23.98.224.zip
s3://cb-customers-secure/merrickxdcrtarget/collectinfo-2017-05-20t002042-ns_1@172.23.98.225.zip

Show
Source cluster: http://supportal.couchbase.com/snapshot/b820146417122b6ca9712c6b7eec86c2%3A%3A1 s3://cb-customers-secure/merrickxdcrsource/collectinfo-2017-05-20t002029-ns_1@172.23.98.210.zip s3://cb-customers-secure/merrickxdcrsource/collectinfo-2017-05-20t002029-ns_1@172.23.98.216.zip s3://cb-customers-secure/merrickxdcrsource/collectinfo-2017-05-20t002029-ns_1@172.23.98.217.zip Target cluster: http://supportal.couchbase.com/snapshot/ae7af46c6397f51e563d05fad1a0684b%3A%3A0 s3://cb-customers-secure/merrickxdcrtarget/collectinfo-2017-05-20t002042-ns_1@172.23.98.223.zip s3://cb-customers-secure/merrickxdcrtarget/collectinfo-2017-05-20t002042-ns_1@172.23.98.224.zip s3://cb-customers-secure/merrickxdcrtarget/collectinfo-2017-05-20t002042-ns_1@172.23.98.225.zip
Is this a Regression?:
No
Sprint:
KV Spock Beta

Description

The issue was seen with XDCR + graceful failover.

Issue seen on node 172.23.98.224. All missing items are one vbucket, vb 998.

2017-05-19T17:15:36.209243-07:00 NOTICE (Test1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.98.224->ns_1@172.23.98.225:Test1 - (vb 998) Creating takeover stream with start seqno 848 and end seqno 18446744073709551615

2017-05-19T17:15:36.690043-07:00 NOTICE (Test1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.98.224->ns_1@172.23.98.225:Test1 - (vb 998) Vbucket marked as dead, last sent seqno: 848, high seqno: 865

2017-05-19T17:15:36.691534-07:00 NOTICE (Test1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.98.224->ns_1@172.23.98.225:Test1 - (vb 998) Stream closing, sent until seqno 848 remaining items 0, reason: The stream ended due to all items being streamed

2017-05-19T17:15:36.692372-07:00 NOTICE (Test1) DCP (Producer) eq_dcpq:replication:ns_1@172.23.98.224->ns_1@172.23.98.225:Test1 - (vb 998) Stream closed, 0 items sent from backfill phase, 0 items sent from memory phase, 848 was last seqno sent

We are closing the takeover stream before all items are streamed (865-848). This is causing the data loss.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-24817
#	Subject	Branch	Project	Status	CR	V
79426,2	MB-24817: Upon DCP stream creation, log end_seqno more accurately	master	kv_engine	Status: MERGED	+2	+1
79500,3	MB-24817, WIP: Check thread sanitizer	watson	ep-engine	Status: ABANDONED	0	-1
79532,4	MB-24817: During takeover, hold stream lock until vb is set to dead	watson	ep-engine	Status: MERGED	+2	+1
79681,1	MB-24817: During takeover, hold stream lock until vb is set to dead	master	kv_engine	Status: ABANDONED	-2	+1
79783,3	Merge remote-tracking branch 'couchbase/watson_ep'	master	kv_engine	Status: MERGED	+2	+1