Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-8915

Tombstone purger need to find a better home for lifetime of deletion

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • techdebt-backlog
    • 2.2.0
    • XDCR
    • Security Level: Public
    • None
    • Untriaged

    Description

      === copy and paste my email to a group of people, it should explain clearly why we need this ticket ===

      Thanks for your comments. Probably it is easier to read in email than code review.

      Let me explain a bit to see if we can be on the same page. First of all the current resolution algorithm (comparing all fields) is still right, yes there is small chance we would touch fields after CAS, but for correctness we should have them there.

      The cause of MB-8825 is that tombstone purger uses expiration time field to put the purger specific "lifetime of deletion". This is just a "temporary solution" because IMHO the expiration time of a key is not the right place for "lifetime of deletion" (this is purely a storage specific metadata, IMHO should not be in eo_engine), but unfortunately today we cannot find a better place to put such info unless we change the storage format, which has too much overhead at this time. In future, I think we need to figure out the best place for "lifetime of deletion" and move it out of key expiration time field.

      In practice, today this temporary solution in tombstone purger is OK in most cases because rarely you have collision in CAS for two deletions on the same key. But MB-8825 just hit the small dark area, when destination tries to replicate a deletion from source back to source in bi-dir XDCR, both share the same (SeqNo, CAS) but different expiration time field (which is not exp time of key, but lifetime of deletion created by tombstone purger), exp time at destination is some times bigger than that at source, causing incorrect resolution results at source. The problem exists for both CAPI and XMEM.

      For backward compatibility,
      1) If both sides are 2.2, we uses new resolution algorithm for deletion and we are safe.
      2) if both sides are pre-2.2, since they do not have tombstone purger, the current algorithm (comparing all fields) should be safe.
      3) If a bi-dir XDCR between pre-2.2 and 2.2 cluster on CAPI. deletion born at 2.2 replicating to pre-2.2 should be safe because there is no tombstone purger at pre-2.2. For deletions born at pre-2.2, we may see them bounced back from 2.2. But there should be no dataloss since you just re-delete something already deleted.

      This fix may not be perfect, but it is still much better than issues in MB-8825. I hope in near future we can find a right place for "lifetime of deletion" in tombstone purger.

      Thanks,

      Junyi

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jliang John Liang
            junyi Junyi Xie (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty