Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5244

Improve polling behavior during XDCR checkpointing inside CAPI

    XMLWordPrintable

Details

    • Story
    • Resolution: Duplicate
    • Major
    • 2.0-beta
    • 2.0-beta
    • XDCR
    • Security Level: Public
    • None

    Description

      In xdcr, during checkpointing, we query ep-engine stats at the destination as follows: Immediately after receiving a checkpointing request from the source, we query the open_checkpoint_id and last_persisted_checkpoint_id from ep-engine and wait until either the last_persisted_checkpoint_id becomes equal to the open_checkpoint_id or a 10 second timeout occurs (in which case we log a warning and do not checkpoint). The idea here is that since checkpointing bypasses ep-engine and updates Couch directly, unlike regular document updates, we need to make sure it's "safe" to checkpoint, and it is safe only after the open checkpoint id seen at the time of receiving the checkpoint request has been persisted.

      Recent runs of xdcr with 1024 vbuckets has revealed that we're hitting this timeout very frequently. This could be due to the following causes:
      1. We issue far too many polling requests. It should suffice to query the stats only once per all replication streams.
      2. It is likely that ep-engine is actually taking far too long to serve the stats requests. This needs to be investigated and fixed if it turns out to be true.

      Another approach to checkpointing that could improve performance is as follows: In 2.0, ep-engine supports command to explicitly close the current open checkpoint and open a new one. Doing this once for all pending replication streams will improve checkpointing performance as we'd only have to wait until the last closed checkpoint is persisted.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            steve Steve Yen
            srinivas Srinivas Vadlamani (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty