Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-14574

XDCR: Pausing and resuming caused entire data set to be re-replicated

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Blocker
    • 4.1.0
    • 4.0.0
    • XDCR
    • Security Level: Public
    • None
    • Untriaged
    • Unknown

    Description

      Build 1891. A 3 node cluster with uni-directional XDCR to a 1 node cluster.

      Here are the steps:

      • After initial replications (where some mutations were lost) I let the cluster sit for 30 minutes (zero operations).
      • I then paused XDCR via the UI then pressed play to resume it.
      • I found that all of the data set (some 550k docs) were queued to be replicated.
      • Replication got to 99.9 % complete then the number of outstanding mutations shot up again this time to 400k.
      • This took 4 bites at the cherry before finally succeeding (see attached graph of hourly outbound XDCR mutations)
      • NB - other than XDCR the cluster was entirely idle throughout this whole time.

      Log files available at
      (Source)
      https://s3.amazonaws.com/customers.couchbase.com/davidH/collectinfo-2015-04-17T121115-ns_1%40192.168.78.101.zip
      https://s3.amazonaws.com/customers.couchbase.com/davidH/collectinfo-2015-04-17T121115-ns_1%40192.168.78.102.zip
      https://s3.amazonaws.com/customers.couchbase.com/davidH/collectinfo-2015-04-17T121115-ns_1%40192.168.78.103.zip

      (Destination)
      https://s3.amazonaws.com/cb-customers/davidH/collectinfo-2015-04-17T121146-ns_1%40192.168.78.104.zip

      Attachments

        Issue Links

          Activity

            People

              arunkumar Arunkumar Senthilnathan (Inactive)
              dhaikney David Haikney (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty