Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49463

[Magma, 10TB, KV+XDCR, 1% DGM]: Out of 4 billion items only 3.8 billion items were replicated during XDCR replication

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.1.0
    • 7.1.0
    • XDCR
    • 7.1.0-1595

    Description

      Steps:
      1. Create a 3 node cluster(172.23.110.64, 172.23.110.68, 172.23.110.69)
      2. Create buckets and 50 collections.
      3. Create 40000000 items in each collection:

      Read Start: 0
      Read End: 0
      Update Start: 0
      Update End: 0
      Expiry Start: 0
      Expiry End: 0
      Delete Start: 0
      Delete End: 0
      Create Start: 0
      Create End: 40000000
      Final Start: 0
      Final End: 40000000
      

      4. Update 40000000 keys to create 50 percent fragmentation:

      Read Start: 0
      Read End: 0
      Update Start: 0
      Update End: 40000000
      Expiry Start: 0
      Expiry End: 0
      Delete Start: 0
      Delete End: 0
      Create Start: 0
      Create End: 0
      Final Start: 0
      Final End: 40000000
      

      5. Create another 40000000 items:

      Read Start: 0
      Read End: 0
      Update Start: 0
      Update End: 0
      Expiry Start: 0
      Expiry End: 0
      Delete Start: 0
      Delete End: 0
      Create Start: 40000000
      Create End: 80000000
      Final Start: 40000000
      Final End: 80000000
      

      6. Update 40000000 keys (created in step 5) to maintain 50 percent fragmentation

      Read Start: 0
      Read End: 0
      Update Start: 40000000
      Update End: 80000000
      Expiry Start: 0
      Expiry End: 0
      Delete Start: 0
      Delete End: 0
      Create Start: 0
      Create End: 0
      Final Start: 40000000
      Final End: 80000000
      

      7. Start ASYNC load:

      Read Start: 0
      Read End: 40000000
      Update Start: 0
      Update End: 40000000
      Expiry Start: 0
      Expiry End: 0
      Delete Start: 40000000
      Delete End: 80000000
      Create Start: 80000000
      Create End: 120000000
      Final Start: 80000000
      Final End: 120000000
      

      8. Rebalance IN with Loading of docs in step 7
      9. Rebalance OUT with Loading of docs in step 7
      10. Rebalance SWAP with Loading of docs in step 7
      11. Rebalance IN/OUT with Loading of docs in step 7
      12. Rebalance OUT/IN with Loading of docs in step 7
      13. Validate all docs mutated in step 7.
      14 Create a XDCR dstn cluster at 172.23.110.70
      15. Create replication on 6 collections. Each having 80M items(309 GB of active data)
      16. After completion of above replication, rebalance in a new node to dst cluster 172.23.110.70(new node is 172.23.110.67)
      17. Create replication of remaining collections on source cluste(i.e remaining 44 collections)
      18 Replication was going fine, 3.8 billon (3,806,471,446) were already replicated. All was well until here
      19 Changed the replica setting on XDCR dest (enabled replica and set it to 1)
      20. For replica to take effect, triggered rebalance
      21. Observed incoming of docs stopped(UI was not showing any doc ops)
      22. Rebalance was successful, but XDCR replication was hung at 3.8 billion items(for more than 7 hours)
      23 Source cluster is not showing any pending mutations

      XDCR Destination Cluster LOGS :

      https://cb-engineering.s3.amazonaws.com/xdcr_Dest/collectinfo-2021-11-10T050429-ns_1%40172.23.110.67.zip
      https://cb-engineering.s3.amazonaws.com/xdcr_Dest/collectinfo-2021-11-10T050429-ns_1%40172.23.110.70.zip

      XDCR Source Cluster LOGS:

      https://cb-engineering.s3.amazonaws.com/xdcr_source/collectinfo-2021-11-10T050623-ns_1%40172.23.110.64.zip
      https://cb-engineering.s3.amazonaws.com/xdcr_source/collectinfo-2021-11-10T050623-ns_1%40172.23.110.68.zip
      https://cb-engineering.s3.amazonaws.com/xdcr_source/collectinfo-2021-11-10T050623-ns_1%40172.23.110.69.zip

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ritesh.agarwal Ritesh Agarwal
            ankush.sharma Ankush Sharma
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty