(Assigned to Damien for initial prioritization)
I have 120 megs file that takes about 15 seconds to compact on cold page cache. On warm page cache it's less than 3 seconds.
When I modified dbdump to read all data (not just btree nodes, but also doc bodies) I've found it needs about 3 seconds to process entire file on cold page cache. And if I ran compactor afterwards I get warm page cache compactor timing. Which confirms that modified dbdump warms all pages. And it clearly does that about 4 times more efficiently.
So I conclude that even without advanced prefetch our data is natually linear and simply loading it depth first (which is what happens during compaction) is fast enough. But there's something bad happening in couchstore compactor that either triggers wrong behavior of kernel's prefetch logic or maybe it does some clearly excessive and non-linearly aligned work.
I'm afraid I have to move to other topic and have to stop spending time on this. So leaving this to you folks.
I started this investigation when I saw 'mere' 90 gigs of data requiring 4 hours to compact in perf runs. That's a bit too slow IMHO.
I'll attach my dbdump patches.