Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 5.5.2, 6.0.0
Affects Version/s: 5.5.1
Component/s: couchbase-bucket, XDCR
Security Level: Public
Labels:
None

Triage:
Untriaged
Is this a Regression?:
Unknown

Description

It seems that XDCR from 4.5.1 (have not tested other versions yet) to Couchbase Server 5.5.x causes corruption on deleted documents leading to inability to rebalance and potential data loss (if replication streams have to reconnect and there is a failover).

Steps To Reproduce

Setup a single-node 4.5.1 cluster and a single-node 5.5.1 cluster
Create a bucket on each cluster
Setup XDCR between 4.5.1 and 5.5.1 on this bucket
Create a document
Delete that document

After step 5, review the document on the source and target cluster:

Source (4.5.1)

Doc seq: 4

     id: test1

     rev: 4

     content_meta: 3

     size (on disk): 0

     cas: 1535966803550732288, expiry: 1535966802, flags: 0, datatype: 0, conflict_resolution_mode: 0

     doc deleted

     could not read document body: document not found

Target (5.5.1)

Doc seq: 4

     id: test1

     rev: 4

     content_meta: 131

     size (on disk): 15

     cas: 1535966803550732288, expiry: 1535966803, flags: 0, datatype: 0x00 (raw)

     doc deleted

     size: 5

     data: (snappy)

Attached a pcap showing the DelWithMeta requests being sent by XDCR.

Seems this has something to do with the format of the packet being sent by 4.5.1 not being respected properly by 5.x.

In theory this issue has no impact (as the docs are deleted), but actually completely breaks rebalance in Couchbase Server 5.5.x onwards.
This is because the value on disk is now snappy compressed (instead of being empty), so the datatype when reading the document off of disk is set to SNAPPY (0x2).
This then means that all subsequent rebalances and internal replications (which backfill) fail for that document with the following error:

2018-08-31T15:29:54.141273Z WARNING 185: Invalid format specified for DCP_DELETION - 4 - closing connection packet:mcbp::header: magic:0x80, opcode:0x58, keylen:23, extlen:21, datatype:0x2, specific:806, bodylen:51, opaque:0x21, rawextras:0000007c7e00000060f15b895e190

The error above occurs because SNAPPY datatype is not a valid datatype for a DCP_DELETION (as this situation should never happen).
This means once you're in this situation with corrupted documents you are unable to rebalance and also risk data loss (if your replication streams don't stay completely in-memory) upon failover.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

xdcr.pcap
178 kB
03/Sep/18 2:55 AM

Issue Links

links to

Google Doc describing impact + workarounds

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-31141
#	Subject	Branch	Project	Status	CR	V
100657,1	Merge branch 'master' of ssh://review.couchbase.org:29418/testrunner into MB-31141	alice	testrunner	Status: NEW	0	0
100658,1	Merge branch 'alice' of ssh://review.couchbase.org:29418/testrunner into MB-31141	alice	testrunner	Status: NEW	0	0
100659,1	Merge branch 'alice' of ssh://review.couchbase.org:29418/testrunner into MB-31141	vulcan	testrunner	Status: NEW	0	0
100669,1	Merge branch 'alice' of ssh://review.couchbase.org:29418/testrunner into MB-31141	alice	testrunner	Status: NEW	0	0
99152,3	MB-31141: Account for nmeta in deleteWithMeta	vulcan	kv_engine	Status: MERGED	+2	+1
99414,4	MB-31141: Don't reject snappy\|raw DCP deletes	vulcan	kv_engine	Status: MERGED	+2	+1
99482,2	MB-31141: Merge couchbase/vulcan to couchbase/alice	alice	kv_engine	Status: MERGED	+2	+1
100348,1	Merge couchbase/vulcan into master	master	kv_engine	Status: ABANDONED	0	0
100349,2	Merge couchbase/alice into couchbase/master	master	kv_engine	Status: ABANDONED	0	-1
100353,1	Merge remote-tracking branch 'couchbase/alice'	master	kv_engine	Status: MERGED	+2	+1
100423,1	Merge branch 'master' of https://github.com/couchbase/testrunner into MB-31141	master	testrunner	Status: ABANDONED	0	0