Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-58989

XDCR: Detect conflicts to be able to log when a true conflict is resolved

    XMLWordPrintable

Details

    • Epic
    • Resolution: Unresolved
    • Major
    • Morpheus
    • None
    • XDCR
    • None
    • XDCR Conflict Logging
    • To Do
    • 0

    Description

      Multiple customers have asked that XDCR log when XDCR resolves a conflict (MB-15561). 

      Basically, what customers mean by a conflict is when the two documents being compared were updated on different clusters.  For example, the same doc id was updated by an application connected to cluster1, and at the same time (within conflict window), updated by an application connected to cluster2.

      This is problematic for XDCR since XDCR does not detect conflicts (every replication is a conflict resolution – one document winning over another – without any understanding of causality).  In the case of MB-15561, there is a PRD for logging whenever the target cluster wins a conflict, but even in the simple case of a one-way replication in an active-passive scenario, this may mean (potentially many) false positives whenever an old mutation is sent from the source more than once due to a backfill, network instability/latency, resend due to errors, or due to a complex topology.  If a customer has questions, then, there would need to be an investigation of what was happening in the environment, and a best guess to the reason why the target document won.

      Keeping document history information (modification time and in which cluster) in version vectors was a way of detecting conflict for custom conflict resolution.  However, even without the custom conflict resolution functionality, we should be able to use version vectors (already developed in beta) simply to detect true conflicts to be able to log when there is a true conflict. 

      This would mean that we would need to enable the HLV (hybrid logical vector) feature in XDCR, and when the conflict is detected, we could log the true conflict but then use the configured conflict resolution mode – Sequence or Timestamp – to resolve the conflict.  The customers would not have to provide a custom merge function to resolve the conflict.  The ability to detect true conflicts would allow us to log that event, including the two documents that were in conflict.

      Basic requirements would be:

      • Detect true conflicts (HLV feature allows you to do this)
      • Log or store the the document id's, document bodies, the conflicting history information in a system collection/conflict bucket – in some way such that the information is easy to access programmatically (like via APIs) so that they can be acted on quickly, if needed
      • Resolve the conflict using the currently existing/configured conflict resolution mode – Sequence or Timestamp –  so that there is no need for a custom way to resolve the documents that are in conflict
      • Allow on-line (no down time) enablement of this feature (enabling HLV, logging true conflicts, etc.)

      The drawback of enabling HLV and the ability to detect true conflicts would be a performance penalty (for detecting conflicts) and the increase in document metadata size (to keep version history for each document).

      Note 1:  There were some performance tests done with version vectors (aka hybrid logical vector – HLV) a long time ago, and I don't recall exactly, but I believe the performance penalty was less than ~5% for just the replication without any conflicts.  If there are conflicts, then, each conflict (or rate of conflicts) will draw additional performance penalties.

      Note 2: If a document can be modified by 3 different clusters within an hour, the size overhead in a 2KB document would be around 5.5% (from a previous estimate).  So, the storage size increase (due to document size increase from the metadata increase) would be dynamic as well, dependent on the number/rate of modifications to the same document in different clusters.

      There would, also, be other additional overhead since the document history info will need to be pruned regularly, otherwise, each document size could become quite large.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              neil.huang Neil Huang
              hyun-ju.vega Hyun-Ju Vega
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty