The couchstore fragmentation calculation is not taking into account completed Prepares. As such, auto-compaction is not run when expected, and hence the completed (no longer needed) prepares are not purged.
Using pilowfight to load 4M documents into two buckets; one using level=none, one with level=majority (same eviction policy, same compaction threshold of 10%):
(Note I also ran with a reduced vBucket count of 4; so make it easier to load a large number of documents (1M) per vBucket.)
This in the level=majority load having 2x the disk space:
Interestingly the fragmentation percentage (measured as (couch_docs_actual_data_size - couch_docs_data_size) / couch_docs_actual_data_size is around 3%. However, if manually run compaction on the "majority" bucket (via the UI) the disk space shrinks to almost half:
What appears to be happening here is that the fragmentation calculation is incorrect - the on-disk Prepares (which have all been committed) are not counted as "overhead", and are instead treated as "valid" documents. This means auto-compaction hasn't run when it would be expected to. When it does run, however, these prepares can all be discarded and hence the file size after compaction is similar to the level=None case.
|For Gerrit Dashboard: MB-42306|
|139272,5||MB-42306 [1/2]: Add onDiskPrepareBytes to vbucket_state||mad-hatter||kv_engine||Status: MERGED||+2||+1|
|139312,7||MB-42306 [2/2]: Bias db_data_size by estimate of completed prepares||mad-hatter||kv_engine||Status: MERGED||+2||+1|
|139382,3||MB-42306: Correctly decode V3 CouchbaseRevMeta||mad-hatter||couchstore||Status: MERGED||+2||+1|
|140801,4||Merge branch mad-hatter into master||master||kv_engine||Status: MERGED||+2||+1|
|140896,5||Merge mad-hatter into master||master||kv_engine||Status: MERGED||+2||+1|
|140903,1||Merge mad-hatter into master||master||couchstore||Status: MERGED||+2||+1|
|163890,2||MB-48923: Avoid underflow of on_disk_prepare_bytes during compaction||mad-hatter||kv_engine||Status: MERGED||+2||+1|