Uploaded image for project: 'Couchbase Documentation'
  1. Couchbase Documentation
  2. DOC-11487

XDCR and Transactions documentation needs to be expanded

    XMLWordPrintable

Details

    • DOC-2023-S19
    • 0

    Description

      In one recent customer escalation, we're seeing that they aren't completely following the recommended guidelines for XDCR and transactions.

      The original documentation is here: https://docs.couchbase.com/server/current/learn/data/transactions.html#transactions-and-replication-xdcr

      The internal design documentation is here:
      https://docs.google.com/document/d/1aHHYoVjPIJcpRZ5_kQcgyr3RQ_3YgRlXml0sbMjuh-A/edit#heading=h.jcrjondf8y8z

      Both design doc and official Couchbase documentation were created years ago as part of release 6.5. I was not part of the development nor the documentation discussion back then, so I am only working off of what was discussed here in the internal documentation.

      There are a few recommendations that we'd like to make:
      Active-Active
      The design doc advises against active-active, however the documentation page allows it. I'm unclear to why that is, but theoretically what is mentioned in the official doc is theoretically possible but risky. If we want to keep the active-active deployment language, the following recommendations should be made:

      1. The wording should advise against using transaction in an active-active replications set up. For customers who choose to do so, they should not execute transactions on the same set of documents, regardless of whether or not they are simultaenous. In other words, customer applications need to be sharded by keys such that a specific active site is responsible for transactions for a specific set of keys.
        Side note: This is because we have seen situations where a failed transaction can cause a document to be permanently tagged with transactional Xattribute, of which only MB-47813 can resolve.
      2. Similarly, we want to potentially create an example to reiterate the sharding point above.
      3. We want to reiterate that if a transaction failed on one cluster, the transaction should be retried on the same cluster instead of being retried on another cluster. This is to avoid MB-47813 from happening, and to keep the sharding concept intact.

      Time-Based CR

      1. The design doc recommended that the customers use Time-Based conflict resolution. This point is missing from the documentation here.

      Doc count

      1. If possible, we would like to mention that running transactions in an XDCR set up can lead to different document counts between source and document bucket because transactions will inevitably introduce metadata documents and that will throw off the count between source and target buckets.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            tony.hillman Tony Hillman (Inactive)
            neil.huang Neil Huang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty