Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Blocker
Fix Version/s: 3.0
Affects Version/s: 2.5.1, 3.0
Component/s: couchbase-bucket, XDCR
Security Level: Public
Labels:
None

Triage:
Untriaged
Operating System:
Centos 64-bit
Is this a Regression?:
No
Sprint:
June 30 - July 18

Description

Build
--------
3.0.0-432

Scenario
--------------
1. Setup two 2node clusters, 1 default bucket on each cluster, bi-dir replication between them. Start loading 1000 docs.
2. Pause replication on either sides, delete destination bucket and recreate it. Do not create replication to source cluster. No workload on dest cluster.
3. Resume replication from source cluster and wait for replication to end.

Item count on source = 10000, on dest = 9990, no xdcr activity seen for 10 mins and longer.

Reproducible
------------------
Consistently reproducible with -
./testrunner -i bixdcr.ini -t xdcr.pauseResumeXDCR.PauseResumeTest.replication_with_pause_and_resume,items=10000,delete_bucket=destination,replication_type=xmem,pause=source

Observations
------------------
1. This issue is only seen in scenarios where there is no workload on source after resume and the source cluster is only replicating what it already has in memory. After resume when there's still workload happening, all docs are replicated.
2. Not sure if this can be seen with plain xdcr without pause/resume. Will try and update the issue.
3. Data replication to destination is spikey and slow(screenshot attached). Although there is no data load on source, outbound mutations are seen in spurts(like spikes), which wakes up replicators in spurts(which is justified) which is reflected on incoming_ops in dest cluster. So here it doesn't appear to be a problem with replicators itself. My qn here is - is it ok to see spikey outbound mutations when there is no data load?

Attaching cbcollect logs.
.186, .187 --> source cluster
.188, .189 --> destination cluster

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cbcollect_MB10457.tar
15.75 MB
10/Apr/14 5:20 PM
diff.txt
289 kB
10/Apr/14 5:20 PM
missing_items_vb_info.txt
328 kB
10/Apr/14 5:20 PM
Screen Shot 2014-03-13 at 11.58.43 AM.png
409 kB
13/Mar/14 2:29 PM
Screen Shot 2014-04-10 at 5.04.01 PM.png
557 kB
10/Apr/14 5:27 PM
Screen Shot 2014-04-10 at 5.05.43 PM.png
792 kB
10/Apr/14 5:27 PM

Issue Links

relates to

MB-10179 XDCR checkpointing: data loss at destination in cases of destination bucket delete-recreate/flush/failover may be undetected for long periods of time

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Aruna Piravi (Inactive)

Reporter:: Aruna Piravi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 13/Mar/14 2:29 PM

Updated:: 17/Jun/14 12:23 PM

Resolved:: 02/May/14 12:27 PM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-10457: added diagnostics for tracing xdcr data loss in ticket: Gerrit Review:

XDCR: Some docs not replicated after deletion and recreation of destination bucket

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty