I was trying to setup xdcr replication from an in-house cluster to EC2. Following things happened:
1. Setup XDCR from 4 node(2 data nodes) inhouse cluster to 16 node EC2 cluster ( 8 data nodes) for 1 bucket (msm).
2. I didn't provide hostname from the EC2 nodes initially. That caused the UI to initially took a long time to respond about what's happening. But after it came back, I tried to delete the replication. There was initially no response and then an error reported on the UI and multiple attempts to delete the replication were unsuccessful.
3. I restarted the goxdcr process on one data node in source cluster 172.23.97.37. This cleaned up the XDCR replication from the UI.
firstname.lastname@example.org 6:43:52 PM Tue Sep 18, 2018
4. I fixed the hostnames on EC2 cluster. Setup the XDCR replication on the source cluster again. This time it started replicating.
5. After replicating 50% of the data, the progress stopped. I then killed the goxdcr process on the 2nd data node in the source cluster and that kicked off the replication of the remaining 50% of the data.
Message in logs before restart:
email@example.com 12:19:26 PM Wed Sep 19, 2018
Source Cluster Logs:
Let me know if you need destination cluster logs as well.
|Field||Original Value||New Value|
|Fix Version/s||5.5.3 [ 15520 ]|
|Fix Version/s||Mad-Hatter [ 15037 ]|
|Assignee||Ritam Sharma [ ritam.sharma ]||Yu Sui [ yu ]|
|Link||This issue blocks MB-31456 [ MB-31456 ]|
|Resolution||Fixed [ 1 ]|
|Status||Open [ 1 ]||Resolved [ 5 ]|
|Assignee||Yu Sui [ yu ]||Pavithra Mahamani [ pavithra.mahamani ]|
|VERIFICATION STEPS||Verified that when the external IP is provided correctly, there is no replication hang.|
|Status||Resolved [ 5 ]||Closed [ 6 ]|
|Actual End||2018-11-09 17:30 (issue has been closed)|