Details
Description
Build
--------
3.0.0-957(xdcr on upr, internal replication on upr)
Clusters
-----------
C1 : http://172.23.105.44:8091/
C2 : http://172.23.105.54:8091/
Free for investigation. Not attaching data files. ran out of space trying to fetch them.
Steps
--------
1a. Load on both clusters till vb_active_resident_items_ratio < 50.
1b. Setup bi-xdcr on "standardbucket", uni-xdcr on "standardbucket1"
2. Access phase with 50% gets, 50%deletes for 3 hrs
3. Rebalance-out 1 node at cluster1
4. Rebalance-in 1 node at cluster1
5. Failover and remove node at cluster1
6. Failover and add-back node at cluster1
7. Rebalance-out 1 node at cluster2
8. Rebalance-in 1 node at cluster2
9. Failover and remove node at cluster2
10. Failover and add-back node at cluster2
11. Soft restart all nodes in cluster1 one by one
Problem
-------------
See screenshot.
standardbucket(C1) <---> standardbucket(C2)
On C1 - 60990237 items
On C2 - 61052795 items
standardbucket1(C1) ----> standardbucket1(C2)
On C1 - 14064164 items
On C2 - 14064146 items
Bucket priority
-----------------------
Both standardbucket and standardbucket1 have high priority.
Attaching cbcollect with xdcr trace logging. I see that views is no more reliable for comparing revID info. I'm writing a script to detect losses at the vb level. Meanwhile I did not want to delay your investigation. Feel free to use the clusters.
Attachments
Issue Links
- is duplicated by
-
MB-11725 XDCR data loss after rebalance out
- Closed