Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: 4.0.0
Affects Version/s: 4.0.0
Component/s: XDCR
Security Level: Public
Labels:
- performance
Environment:
centOS 6.x, 4 cores, 15Gb RAM - each node

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
C1:
https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.44.zip
https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.45.zip
https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.48.zip
https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.49.zip
https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.50.zip
https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.51.zip

C2:

Show
C1: https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.44.zip https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.45.zip https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.48.zip https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.49.zip https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.50.zip https://s3.amazonaws.com/cb-customers/couchbase/collectinfo-2015-04-09T140126-ns_1%40172.23.105.51.zip C2:
Epic Link:
XDCR next release
Is this a Regression?:
Unknown

Description

Build

4.0.0-1767

Clusters
-----------
C1 : http://172.23.105.44:8091/
C2: http://172.23.105.54:8091/
The clusters are available for investigation.

What we do in XDCR System test
------------------------------
1. Load on both C1[8 nodes], C2[8 nodes] till vb_active_resident_items_ratio < 50 on standardbucket, <70 on standardbucket1.
2. Create xdcr:
C1.standardbucket <--> C2.standardbucket , no filter
C1.standardbucket1 --> C1.standardbucket1 , no filter
no replication on sasl bucket.
2. Access phase with 98% gets, 2%sets runs for 3 hours
3. Rebalance-out 1 node at cluster1 with workload
4. Rebalance-in the same node with workload
5. Failover one node with workload. Rebalance failed here, but I will file a different bug for the same. This run of system test stopped here.

Ideally system test does not end here. We do the same set of operations on C2 with workload, followed by warmup of both clusters. Please note, compaction is not disabled and given the high workload, keeps running often.

The problem here is that replication is extremely slow.
See bucket: standardbucket
eviction_policy : value only
active_resident_ratio : 48.4%
memory allotted : 5GB per node,currently 7 nodes so ~35GB, high watermark not reached.
current mem usage: 22.3GB
doc receival rate : 21K
mutation replication rate : 38.1
Attached 3 different set of screenshots of slow replication and one for bucket memory usage.

Attaching logs from C1 and C2. I also see multiple repeated goxdcr crashes. Will create a separate issue for the same. Also see 98 dcp connections for xdcr. Will log another MB for it. Thanks.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Screen Shot 2015-04-09 at 10.20.37 AM.png
418 kB
09/Apr/15 7:37 AM
Screen Shot 2015-04-09 at 10.25.10 AM.png
393 kB
09/Apr/15 7:37 AM
Screen Shot 2015-04-09 at 10.27.23 AM.png
408 kB
09/Apr/15 7:37 AM
Screen Shot 2015-04-09 at 10.27.28 AM.png
482 kB
09/Apr/15 7:37 AM

Issue Links

duplicates

MB-14389 [GoXDCR-System test] Multiple repeated goxdcr crashes noted

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Xiaomei Zhang (Inactive)

Reporter:: Aruna Piravi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Apr/15 7:37 AM

Updated:: 23/Sep/16 2:49 PM

Resolved:: 09/Apr/15 10:43 PM

Gerrit Reviews

There are no open Gerrit changes

[GoXDCR-System test] Replication is extremely slow (doc receival rate is in 10000s and replication rate is in 10s/100s)

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty