XDCR: Provide Observe with XDCR

Description

Add functionality on the Couchbase server side so that a client level request for observe can block the request until either persisted to disk, replicated to another node intra cluster or replicated to a remote cluster.

Affects versions

Fix versions

None

Labels

Environment

Couchbase Server 2.0 GA with XDCR either across datacenters or AWS Regions

Release Notes Description

None

blocks

Activity

Show:

Hyun-Ju Vega August 17, 2024 at 1:15 AM

Since XDCR is an eventually consistent architecture, synchronous checks of documents is not a possibility.
Currently, you can use the following ways to monitor the XDCR replication and check documents in the replicated buckets.

  • Prometheus metrics – there are over 60 XDCR and related metrics, including xdcr_docs_written_total, xdcr_pipeline_errors, xdcr_pipeline_status, xdcr_wtavg_docs_latency_seconds, xdcr_docs_failed_cr_target_total, xdcr_docs_failed_cr_source_total, and more to monitor the XDCR replications.

  • You can use the xdcrDiffer utility (https://github.com/couchbase/xdcrDiffer) to run a report. (xdcrDiffer is planned to be included in the Morpheus Server package as a diagnostic utility for Support.)

  • You can programmatically check the CAS of documents on source and target clusters. 

  • You can use the Kafka connector to retrieve changes from a source bucket and confirm the changes on the target.

Matt Ingenthron August 12, 2022 at 8:43 PM

Just commenting here since this one comes up frequently…

At the high level, the user may think what they're asking for is fairly simple, but the semantics of replication are very different between in-cluster and across cluster by design. Also, the failure modes of WAN are are rather different.

If a user asks for this feature, it may be a good idea to approach this with an answer, e.g. "we don't have that today", and then try to dive into what they're trying to achieve. Chances are they don't want writes to stop when the remote cluster is no longer available. And there would be no compensating logic anyway for this 'observe'. It might be that they want some way to have confidence around a recovery point object (a.k.a. RPO) which may be able to be satisfied with stats. Or it might be that they want to consider how to be cache coherent across clusters… which has other techniques.

If you arrive here asking "when will this be available", please consider trying to gather more info about what the user is trying to do with this feature and open a CBSE.

Hyun-Ju Vega August 30, 2021 at 9:22 PM

Currently in feature-backlog – requirements are not clear.

Chaitra Ramarao January 10, 2020 at 5:15 AM

Based on the discussions and reviews with field and customers, this is not a top priority anymore. Will share the details of the learning in an email. Hence, moving this to feature backlog.

Chin Hong April 19, 2019 at 8:59 PM
Edited

can you provide a summary on this requirement from your discussions with customers?

Won't Do
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created January 28, 2013 at 6:25 PM
Updated August 17, 2024 at 1:15 AM
Resolved August 17, 2024 at 1:15 AM
Instabug