Loading...

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.1.0
Affects Version/s: 7.1.0
Component/s: XDCR
Labels:
- magma
- xdcr
Environment:
7.1.0-1521

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
Src:
http://supportal.couchbase.com/snapshot/1d65f77bc8df53bb2533771684b3c78d::0
s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.116.zip
s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.120.170.zip
s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.124.zip
s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.126.zip
s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.127.zip
s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.128.zip
s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.129.zip
s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.130.zip
s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.48.zip

Dstn:
http://supportal.couchbase.com/snapshot/e36f78aae37740197f0fd990ee484227::0
s3://cb-customers-secure/dstn_cluster/2021-10-19/collectinfo-2021-10-19t180106-ns_1@172.23.121.131.zip
s3://cb-customers-secure/dstn_cluster/2021-10-19/collectinfo-2021-10-19t180106-ns_1@172.23.121.132.zip
s3://cb-customers-secure/dstn_cluster/2021-10-19/collectinfo-2021-10-19t180106-ns_1@172.23.121.133.zip
s3://cb-customers-secure/dstn_cluster/2021-10-19/collectinfo-2021-10-19t180106-ns_1@172.23.121.134.zip

Show
Src: http://supportal.couchbase.com/snapshot/1d65f77bc8df53bb2533771684b3c78d::0 s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.116.zip s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.120.170.zip s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.124.zip s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.126.zip s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.127.zip s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.128.zip s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.129.zip s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.130.zip s3://cb-customers-secure/src_cluster/2021-10-19/collectinfo-2021-10-19t180046-ns_1@172.23.121.48.zip Dstn: http://supportal.couchbase.com/snapshot/e36f78aae37740197f0fd990ee484227::0 s3://cb-customers-secure/dstn_cluster/2021-10-19/collectinfo-2021-10-19t180106-ns_1@172.23.121.131.zip s3://cb-customers-secure/dstn_cluster/2021-10-19/collectinfo-2021-10-19t180106-ns_1@172.23.121.132.zip s3://cb-customers-secure/dstn_cluster/2021-10-19/collectinfo-2021-10-19t180106-ns_1@172.23.121.133.zip s3://cb-customers-secure/dstn_cluster/2021-10-19/collectinfo-2021-10-19t180106-ns_1@172.23.121.134.zip
Story Points:
1
Is this a Regression?:
Unknown

Description

Steps:

Create a 9 node cluster
Create a 5 node XDCR remote cluster
Create required buckets and 10 collections.
Create required buckets and 10 collections on XDCR remote.
Create 1000000 items/collection with durability majority
Update 1000000 keys to create 50 percent fragmentation
Create new 1000000 items/collection with durability majority
Update new 1000000 keys to create 50 percent fragmentation
Start a CRUD data load asynchronously.
Rebalance in with Loading of docs. Crash and resume rebalance at 20%, 40%, 60%, 80%.
Crash Magma/memc on source cluster 20 times with Loading of docs
Rebalance Out with Loading of docs. Crash and resume rebalance at 20%, 40%, 60%, 80%.
XDCR replication seems to be stuck. Since 30 mins 64.005M mutations remaining
Also, because there were continuous CRUD going on the in the source cluster items in destination cluster are more than in source. Not sure if this is expected?

QE Test

git fetch "http://review.couchbase.org/TAF" refs/changes/06/163706/1 && git checkout FETCH_HEAD

guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/magma_temp_job4.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False,iterations=2,sdk_timeout=60,log_level=debug,infra_log_level=debug,skip_cleanup=True -t aGoodDoctor.Hospital.Murphy.SystemTestMagma,nodes_init=9,graceful=True,skip_cleanup=True,num_items=1000000,num_buckets=1,bucket_names=GleamBook,doc_size=1024,key_size=18,assert_crashes_on_load=True,num_collections=10,xdcr_collections=10,maxttl=10,num_indexes=20,pc=10,xdcr_remote_nodes=5,index_nodes=0,query_nodes=0,cbas_nodes=0,fts_nodes=0,ops_rate=50000,doc_ops=create:update:delete:read,durability=Majority,crashes=10 -m rest'

I have this observation with XDCR that the RAM usage on the source/Dstn cluster is always very high while there is no mutation on the source cluster and just XDCR is runing.
Source

Destination

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

DstnClusterNodesRAM.png
19/Oct/21 11:11 AM
166 kB
Ritesh Agarwal
SrcClusterNodesRAM.png
19/Oct/21 11:03 AM
268 kB
Ritesh Agarwal

Issue Links

is caused by

MB-48674 XDCR - P2P - bg process to pull ckpts under error conditions

Closed

MB-49024 XDCR - P2P - backfill pipeline not honoring dynamic wait time

Closed

is cloned by

MB-49030 primary/secondary memory domain stats incorrect

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

[Magma, KV+XDCR, SmallScale]: XDCR replication seems to be stuck.

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty