Details
-
Bug
-
Resolution: Fixed
-
Critical
-
3.0
-
Security Level: Public
-
Build 3.0.0-0132-rel
-
Untriaged
-
Centos 64-bit
-
Yes
Description
Copying comments from MB-11440.
[Points to Highlight]
Problem is always appearing with SSL-XDCR only. Same test is always passing with non-SSL XDCR test always.
[Test Conditions]
It is found that test is always reproducible if Failover side have 3 nodes and other side have 4 nodes. i.e. After failover+rebalance there should be 2 nodes. e.g. I tried this test with 4 nodes cluster and test passed.
After analysis of the test it found that updates are replicated to other side very slowly that caused this issue.
[Test Steps]
1. Have 3 nodes Source cluster (S) , 4 nodes Destination cluster (D).
2. Create two buckets sasl_bucket_1, sasl_bucket_2.
3. Setup SSL Bi-directional XDCR (CAPI) for both buckets.
4. Load 10000 items on each bucket and Source. keys with prefix "loadOne".
5. Load 10000 items on each bucket and Source. keys with prefix "loadTwo".
6. Wait for 3 minutes to ensure of items are replicated to either ends.
7. Failover+Rebalance one node at Source cluster.
8. Perform Updates (3000) and Delete(3000) items on Source. keys with prefix "loadOne".
9. Perform Updates (3000 items) on Destination. keys with prefix "loadTwo".
10. Test will fail with data mismatch error the data on Source (S) and Destination (D). It is the case that key from Destination (D i.e. non-failover side) i.e. "loadTwo" were not replicated when validation took place.
[Additional information]
1. Test with lesser number of items/updates are passed successfully.
2. Test with single bucket is passed with above mentioned items/mutations.
[Workaround]
Increase timeout to 5 minutes (from 3 minutes) to wait for outbound mutations to zero, which will ensure that all data is replicated from either side in bi-directional replication. But XDCR with UPR should be even more faster than previous XDCR.