Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-58030

[CBM] merges should deduplicate documents in the Rift format

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • Morpheus
    • 6.6.0, 7.0.0, 7.1.0, 7.2.0
    • tools
    • None
    • 0

    Description

      What is the problem?
      cbbackupmgr has a merge command which is intended to reduce disk space by deduplicated mutations for every key across a series of backups. Unfortunately currently the merge command does not significantly reduce the disk space used as it does not do this deduplication in the data file.

      In SQLite/ForestDB we got the deduplication for "free" because the document value was stored in the index, and each key had only one entry in the index. In Rift the index and the data are split, and the data file is append-only. This means the same document can be appended to the data file multiple times, even if it is only in the index once.

      What is the solution?
      A couple of ideas (both from James Lee):

      1. Merge backwards. If we do this then we know if we ever see a key for a second time we can just ignore it. This isn't true when merging forwards because we always want to take the last mutation/deletion associated with a key
      2. Do the merge as normal but afterwards dedup the data file

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              owend Daniel Owen
              Matt.Hall Matt Hall
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty