Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49019

[Magma, KV+XDCR, SmallScale]: XDCR replication seems to be stuck.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.1.0
    • 7.1.0
    • XDCR
    • 7.1.0-1521

    Description

      Steps:

      1. Create a 9 node cluster
      2. Create a 5 node XDCR remote cluster
      3. Create required buckets and 10 collections.
      4. Create required buckets and 10 collections on XDCR remote.
      5. Create 1000000 items/collection with durability majority
      6. Update 1000000 keys to create 50 percent fragmentation
      7. Create new 1000000 items/collection with durability majority
      8. Update new 1000000 keys to create 50 percent fragmentation
      9. Start a CRUD data load asynchronously.
      10. Rebalance in with Loading of docs. Crash and resume rebalance at 20%, 40%, 60%, 80%.
      11. Crash Magma/memc on source cluster 20 times with Loading of docs
      12. Rebalance Out with Loading of docs. Crash and resume rebalance at 20%, 40%, 60%, 80%.
      13. XDCR replication seems to be stuck. Since 30 mins 64.005M mutations remaining
      14. Also, because there were continuous CRUD going on the in the source cluster items in destination cluster are more than in source. Not sure if this is expected?

      QE Test

      git fetch "http://review.couchbase.org/TAF" refs/changes/06/163706/1 && git checkout FETCH_HEAD
      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/magma_temp_job4.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False,iterations=2,sdk_timeout=60,log_level=debug,infra_log_level=debug,skip_cleanup=True -t aGoodDoctor.Hospital.Murphy.SystemTestMagma,nodes_init=9,graceful=True,skip_cleanup=True,num_items=1000000,num_buckets=1,bucket_names=GleamBook,doc_size=1024,key_size=18,assert_crashes_on_load=True,num_collections=10,xdcr_collections=10,maxttl=10,num_indexes=20,pc=10,xdcr_remote_nodes=5,index_nodes=0,query_nodes=0,cbas_nodes=0,fts_nodes=0,ops_rate=50000,doc_ops=create:update:delete:read,durability=Majority,crashes=10 -m rest'
      

      I have this observation with XDCR that the RAM usage on the source/Dstn cluster is always very high while there is no mutation on the source cluster and just XDCR is runing.
      Source

      Destination

      Attachments

        1. DstnClusterNodesRAM.png
          166 kB
          Ritesh Agarwal
        2. SrcClusterNodesRAM.png
          268 kB
          Ritesh Agarwal

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty