Loading...

XML

Word

Printable

Details

Type: Story
Resolution: Duplicate
Priority: Major
Fix Version/s: 2.0-beta
Affects Version/s: 2.0-beta
Component/s: XDCR
Security Level: Public
Labels:
None

Description

In xdcr, during checkpointing, we query ep-engine stats at the destination as follows: Immediately after receiving a checkpointing request from the source, we query the open_checkpoint_id and last_persisted_checkpoint_id from ep-engine and wait until either the last_persisted_checkpoint_id becomes equal to the open_checkpoint_id or a 10 second timeout occurs (in which case we log a warning and do not checkpoint). The idea here is that since checkpointing bypasses ep-engine and updates Couch directly, unlike regular document updates, we need to make sure it's "safe" to checkpoint, and it is safe only after the open checkpoint id seen at the time of receiving the checkpoint request has been persisted.

Recent runs of xdcr with 1024 vbuckets has revealed that we're hitting this timeout very frequently. This could be due to the following causes:
1. We issue far too many polling requests. It should suffice to query the stats only once per all replication streams.
2. It is likely that ep-engine is actually taking far too long to serve the stats requests. This needs to be investigated and fixed if it turns out to be true.

Another approach to checkpointing that could improve performance is as follows: In 2.0, ep-engine supports command to explicitly close the current open checkpoint and open a new one. Doing this once for all pending replication streams will improve checkpointing performance as we'd only have to wait until the last closed checkpoint is persisted.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Steve Yen

Reporter:: Srinivas Vadlamani (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 04/May/12 5:28 PM

Updated:: 03/Jul/12 4:47 PM

Resolved:: 03/Jul/12 4:47 PM

Gerrit Reviews

There are no open Gerrit changes

Improve polling behavior during XDCR checkpointing inside CAPI

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty