Details
-
Bug
-
Resolution: Unresolved
-
Major
-
Morpheus
-
DOC-2023-S19
-
0
Description
In one recent customer escalation, we're seeing that they aren't completely following the recommended guidelines for XDCR and transactions.
The original documentation is here: https://docs.couchbase.com/server/current/learn/data/transactions.html#transactions-and-replication-xdcr
The internal design documentation is here:
https://docs.google.com/document/d/1aHHYoVjPIJcpRZ5_kQcgyr3RQ_3YgRlXml0sbMjuh-A/edit#heading=h.jcrjondf8y8z
Both design doc and official Couchbase documentation were created years ago as part of release 6.5. I was not part of the development nor the documentation discussion back then, so I am only working off of what was discussed here in the internal documentation.
There are a few recommendations that we'd like to make:
Active-Active
The design doc advises against active-active, however the documentation page allows it. I'm unclear to why that is, but theoretically what is mentioned in the official doc is theoretically possible but risky. If we want to keep the active-active deployment language, the following recommendations should be made:
- The wording should advise against using transaction in an active-active replications set up. For customers who choose to do so, they should not execute transactions on the same set of documents, regardless of whether or not they are simultaenous. In other words, customer applications need to be sharded by keys such that a specific active site is responsible for transactions for a specific set of keys.
Side note: This is because we have seen situations where a failed transaction can cause a document to be permanently tagged with transactional Xattribute, of which onlyMB-47813can resolve. - Similarly, we want to potentially create an example to reiterate the sharding point above.
- We want to reiterate that if a transaction failed on one cluster, the transaction should be retried on the same cluster instead of being retried on another cluster. This is to avoid
MB-47813from happening, and to keep the sharding concept intact.
Time-Based CR
- The design doc recommended that the customers use Time-Based conflict resolution. This point is missing from the documentation here.
Doc count
- If possible, we would like to mention that running transactions in an XDCR set up can lead to different document counts between source and document bucket because transactions will inevitably introduce metadata documents and that will throw off the count between source and target buckets.