Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11076

XDCR checkpointing : would more _pre_replicate calls during replication help detect data losses at destination sooner?

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • None
    • 3.0
    • XDCR
    • Security Level: Public
    • None
    • Untriaged
    • No

    Description

      [This is more of a design question than a bug]

      The dev spec(https://docs.google.com/document/d/17leftKE01b2EKt6AoO-YeNrK3LT7dSwl38F9Glt-Pbs) on checkpointing states -
      "xdcr replicator may from _time to time_ perform POST to /_pre_replicate with vb, bucket and vbopaque. To verify that xdcr replicator is still talking to the same vbucket, xdcr replicator started replicating into. 200 response indicates success as usual. And 4xx indicates that remote vbucket might have lost some previously replicated mutations. And thus xdcr replication needs to be restarted from past checkpoint or from the beginning."

      So my understanding was - there would be more pre_replicate calls than commit_for_checkpoints and pre_replicates would happen between commit_for_checkpoint calls to detect data loss at destination not having to wait until it's time to checkpoint.

      However the checkpoint code - https://github.com/membase/ns_server/blob/12aa7bdf45434e334e2eade6e9e0c84228f0adeb/src/xdc_vbucket_rep_ckpt.erl and tcpdumps from destination reveal that we perform_pre_replicate only when -

      1. No checkpoint is found <== this is when replicator starts as a result of a vb receiving its first mutation
      2. while parsing an existing checkpoint file <== only in cases of source node restart, vbucket moves or failure in commit_for_checkpoint.

      If we pre_replicated more often, would we not detect destination data losses sooner?

      Pls feel free to correct me and close this issue if my understanding/reading of erlang code is incorrect.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              alkondratenko Aleksey Kondratenko (Inactive)
              apiravi Aruna Piravi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty