Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: Morpheus
Affects Version/s: 6.6.0, 7.0.0, 7.1.0, 7.2.0
Component/s: tools
Labels:
None

Story Points:
0

Description

What is the problem?
cbbackupmgr has a merge command which is intended to reduce disk space by deduplicated mutations for every key across a series of backups. Unfortunately currently the merge command does not significantly reduce the disk space used as it does not do this deduplication in the data file.

In SQLite/ForestDB we got the deduplication for "free" because the document value was stored in the index, and each key had only one entry in the index. In Rift the index and the data are split, and the data file is append-only. This means the same document can be appended to the data file multiple times, even if it is only in the index once.

What is the solution?
A couple of ideas (both from James Lee):

Merge backwards. If we do this then we know if we ever see a key for a second time we can just ignore it. This isn't true when merging forwards because we always want to take the last mutation/deletion associated with a key
Do the merge as normal but afterwards dedup the data file

Attachments

Issue Links

relates to

DOC-11361 [CBM] Clarify that merge on rift files will not deduplicate documents

Resolved

MB-49528 [CBM] Iterate backwards when performing a merge/restore

Open

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Daniel Owen

Reporter:: Matt Hall

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Jul/23 4:54 AM

Updated:: 14/Mar/24 3:22 AM

Gerrit Reviews

There are no open Gerrit changes

[CBM] merges should deduplicate documents in the Rift format

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty