Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-55798

CDC: The history time "now" callback doesn't know the vbucket.

    XMLWordPrintable

Details

    • 0

    Description

      This MB tracks the improvement so that we are better behaved if XDCR is enabled - however the following example is possibly quite extreme in that a nodes should be better synchronized with time.

      Magma has callback API where KV-engine provides a function so that we can return a seconds value that is used to decide what keys in history can be discarded.

      static std::chrono::seconds getHistoryTimeNow() {
            // @todo: require interface changes so that we can locate the vbucket and
            // then peek at the correct HLC
            using namespace std::chrono;
            return duration_cast<seconds>(nanoseconds(HLC::getMaskedTime()));
       }
      

      The problem is that this API has no vbucket as input and no other "bucket" context, and as such there are cases where we certainly could be discarded keys too soon because we are unable to read the vbucket's view of time.

      Each key/value has a timestamp, which is actually the CAS, which is a Hybrid Logical Clock timestamp. When we chose the CAS as the timestamp for CDC we accepted that there was be some corner cases where CDC operates "unusually", that is when the Hybrid Logical Clock is a Logical Clock.

      If a node is receiving data from XDCR (set-with-meta) then each vbucket's HLC can be pushed forward (and into LC mode) iff a set-with-meta writes a CAS that is ahead of the node's local clock.

      The vbucket HLC then ticks montonically by 1 increment for every mutation until the system clock catches up. Thus if the HLC is pushed forwards to 16:05 when the system clock read 16:00, there is a 5 minute window when all new mutations receive a CAS that is 16:05++ (+1 for each mutation) - this ensures mutations can be ordered by CAS, but nothing can be said about the age of the document...

      And that's where this MB comes in.

      In the example where the system clock is 16:00, yet all mutations are now getting logical timestamps using 16:05 as the base and CDC is configured with time retention, we would be keeping items around for less time than is configured.

      E.g. if retention time is between the real-clock and the LC, e.g 3 minutes. The mutations will effectively be instantly discarded because we are always comparing the system-clock vs the CAS (16:00 vs 16:05).

      If we update the API so that the vbucket object can be located the time now function can return a "peek" of the HLC.

      In our example this would mean that we are comparing 16:05 vs 16:05 (for as long as it takes for the system-clock to catch-up, which is 5 minutes). With this new comparison we keep the mutation in the history window. Note that KV would keep the mutation longer than the 3 minutes config (approx 8 minutes in this example), but that's better than less.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              jwalker Jim Walker
              jwalker Jim Walker
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty