Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
feature-backlog
-
None
Description
Current Couchstore's buffered IO handle maintains a write buffer, to write consecutive data at once as large as possible (up to 128KB by default).
However, whenever we call pread(), it flushes existing write buffer, to avoid data inconsistency between write buffer (dirty data) and the actual file (clean data).
https://github.com/couchbase/couchstore/blob/c113cd24af1e2318b1a0f80f0e19e7aead646aa9/src/iobuffer.cc#L386
During save_doc(), in most cases reading and writing B+tree node happen alternately. As a result we cannot utilize write buffer at all; pwrite() is invoked for every single B+tree node update, which is much smaller than the max write buffer size (500~1000 bytes vs. 128 KB).
One observation here is that this includes unnecessary flushing. As Couchstore is based on append-only data model, all data prior than the current write buffer's offset are immutable. Which means that if we are going to read some data that is not overlapping with the current write buffer, we don't need to flush the write buffer in this case. We can reduce a lot of pwrite() and memcpy() calls in this manner.