Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51224

On flushing the bucket in XDCR destination cluster XDCR replication doesn't restart.

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 7.1.0
    • Neo.next
    • XDCR
    • 7.1.0-2363

    Description

      1. Load items in source cluster
      2. Let the replication complete
      3. Flush bucket on destination cluster
      4. Replication does not restart. vB seq-no is set to 0 on destination cluster and vb uuid also changed after flush.

      QE Test

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/magma_temp_job1.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.Hospital.Murphy.ClusterOpsVolume,nodes_init=4,graceful=True,skip_cleanup=True,num_items=20000000,num_buckets=1,bucket_names=GleamBook,doc_size=1024,bucket_type=membase,eviction_policy=fullEviction,iterations=1,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,assert_crashes_on_load=True,num_collections=50,maxttl=10,num_indexes=5,pc=25,index_nodes=0,xdcr_collections=50,xdcr_remote_nodes=4,cbas_nodes=0,fts_nodes=0,ops_rate=50000,ramQuota=10240,doc_ops=create:update:delete:read,rebl_ops_rate=20000,key_type=RandomKey,vbuckets=1024,mutation_perc=30 -m rest'
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          neil.huang Neil Huang added a comment -

          Ritesh Agarwal - is it possible to try something else in addition?
          Can you try writing another document or item after step 3? And wait another 10 min to see if the replication restarts?

          It looks like there's no more mutation done after step 1.

          neil.huang Neil Huang added a comment - Ritesh Agarwal - is it possible to try something else in addition? Can you try writing another document or item after step 3? And wait another 10 min to see if the replication restarts? It looks like there's no more mutation done after step 1.

          Yes Neil Huang, you are right. There is no mutations after step 3 on the source cluster. Unfortunately the cluster is gone as i restarted another test on it. Will try once the run on this cluster is done and put the findings here.

          ritesh.agarwal Ritesh Agarwal added a comment - Yes Neil Huang , you are right. There is no mutations after step 3 on the source cluster. Unfortunately the cluster is gone as i restarted another test on it. Will try once the run on this cluster is done and put the findings here.
          jliang John Liang added a comment -

          Moving out to Neo since customer is not going to flush bucket in production.

          jliang John Liang added a comment - Moving out to Neo since customer is not going to flush bucket in production.

          Just tried this out. All docs are replicated 10 mins after the bucket flush.

          pavithra.mahamani Pavithra Mahamani (Inactive) added a comment - Just tried this out. All docs are replicated 10 mins after the bucket flush.

          Neil Huang/John Liang: So is it until we have at least 1 mutation for all 1024 vBuckets the replication for those vBuckets will not happen?

          ritesh.agarwal Ritesh Agarwal added a comment - Neil Huang / John Liang : So is it until we have at least 1 mutation for all 1024 vBuckets the replication for those vBuckets will not happen?

          Pavithra Mahamani, Can you explain the steps you have tried?

          ritesh.agarwal Ritesh Agarwal added a comment - Pavithra Mahamani , Can you explain the steps you have tried?
          neil.huang Neil Huang added a comment - - edited

          So is it until we have at least 1 mutation for all 1024 vBuckets the replication for those vBuckets will not happen?

          I think just one mutation for one VB should be able to trigger the whole pipeline/replication restart. The key here is that checkpoint operation needs to take place. When there's no mutation, checkpoint won't take place because nothing has changed.

          Just tried this out. All docs are replicated 10 mins after the bucket flush.

          I suspect there's timing issue involved.

          It could work if:

          1. <Checkpoint takes place>
          2. Last mutation is written
          3. Flush
          4. <Checkpoint takes place> - detects VBUUID change and restart replication

          It may not work if:

          1. Last mutation is written
          2. <Checkpoint takes place>
          3. Flush
          4. (No more checkpoint take place since no new mutation since)

          To confirm, you can pause and resume a pipeline to force a <checkpoint takes place> operation

          neil.huang Neil Huang added a comment - - edited So is it until we have at least 1 mutation for all 1024 vBuckets the replication for those vBuckets will not happen? I think just one mutation for one VB should be able to trigger the whole pipeline/replication restart. The key here is that checkpoint operation needs to take place. When there's no mutation, checkpoint won't take place because nothing has changed. Just tried this out. All docs are replicated 10 mins after the bucket flush. I suspect there's timing issue involved. It could work if: <Checkpoint takes place> Last mutation is written Flush <Checkpoint takes place> - detects VBUUID change and restart replication It may not work if: Last mutation is written <Checkpoint takes place> Flush (No more checkpoint take place since no new mutation since) To confirm, you can pause and resume a pipeline to force a <checkpoint takes place> operation

          This makes sense Neil Huang, thanks!

          ritesh.agarwal Ritesh Agarwal added a comment - This makes sense Neil Huang , thanks!

          People

            neil.huang Neil Huang
            ritesh.agarwal Ritesh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty