Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-54999

CDC: Handle when a replica loses history

    XMLWordPrintable

Details

    Description

      Design identified that a case can exist where a break in the replicated history could occur, e.g. a replica will be receiving the following snapshots (maybe a disconnect occurs somewhere).

      • snapshot {1, 10, history}
      • snapshot {10, 15, history}
      • snapshot {15, 20, ...}
      • snapshot {20,30, ...}
      • snapshot {30,40, history}

      E.g. the replica should of had all history for 1 to 15, but 15-20 lost history, history later recovers (maybe the producer now crossed back into the retention window).

      In the design we've stated that this will be handled by having the historyStartSeqno track with the high-seqno whilst we have no history, so during the period we are processing the snapshots with no history, the historyStartSeqno follows the high-seqno, only once we get to seqno 30 can history claim to begin at 30...

      A "quick" way to achieve this is to just tell magma to stop retaining history as soon as KV flushes a checkpoint without history, and we tell magma to resume retaining history once we flush the history snapshot(s). This will be done by using magma's new SetHistoryRetentionBytes method, using 0 for no history and then back to the configured size.

      Overall this leads me to the conclusion that all setting of a vbuckets history retention size must come from the flusher, we cannot for example have other paths trying to set the history retention size (e.g. cbepctl) because it cannot know if a replica is temporarily in the 0 size (no history state).

      We also likely want to avoid just blindly calling SetRetentionBytes for every flush (it could be cheap to call, or it may trigger actions...)

      I propose that KVStore gets 2 new funcs

      • setRetentionSize(vbid, size_t)
        • set how much history can be store
      • size_t getRetentionSize(vbid)
        • get the last value used in setRetentionSize - i propose the KVStore (MagmaKVStore) just caches this value, this isn't a call onto magma. This could also maybe be a vbucket member?

      Now when we flush the following checks can be made.

      // stop tracking once we flush a non history checkpoint
      if (checkpoint.type != history && kvs.getRetentionSize(vb) > 0) {
        kvs.setRetentionSize(vb, 0);
      }
       
      // start tracking once we flush a history checkpoint or adjust to a config change
      if (checkpoint.type == history && kvs.getRetentionSize(vb) != kvbucket.getHistorySizeBytes()) {
        kvs.setRetentionSize(vb, kvbucket.getHistorySizeBytes());
      }
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jwalker Jim Walker
            jwalker Jim Walker
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty