Details
-
Task
-
Resolution: Duplicate
-
Major
-
7.1.4
Description
Issue
kv-engine pauses DCP backfills if mem_used is high. However Magma's BySeqIterator even in the paused state continues to consume buffer memory. This can cause kv-engine to never come out of such a pause and hang backfills forever.
Here's such a reproduction of this issue on cluster_run, 1GB bucket quota, 1024 DCP backfills, 1MB item sizes.
All backfills are paused due to "DCP backfilling task temporarily suspended because the current memory usage is too high".
secondary_mem_domain is 931MB. There's also 300MB of unaccounted memory.
We need to release the buffer memory in such situations so that kv-engine gets free memory to continue at least some backfills.
Background
BySeqIterator buffer memory consumption is due to holding of index, data blocks for multiple SSTables across multiple LSMTree levels.
For item sizes <= data block size, the memory used is fairly predictable:
5 levels/sstables * 3 index depth * 4KB block size = 60KB.
However if the item size is greater than data block size, then the memory usage is not predictable and is dominated by item size as shown in the reproduction above. For example taking 1MB item size:
4 non-LSD levels * 3 index depth * 4KB block size = 48KB.
2 index blocks * 4KB + 1 data block * 1MB = 1.008MB.
Total=1.056MB
Possible solutions
1) Destroy/recreate the iterator after every pause/unpause.
Pros: Easy to implement. Optimises memory usage in all cases irrespective of item size.
Cons: Reseek will incur read I/Os to reposition iterator across all levels.
2) For large items when a data block will only contain one item, release data block memory (BlockIterator in SSTableIterator) and give kv-engine a value copy.
Pros: No extra read I/Os incurred.
Cons: More code change required. Only handles the case where item sizes are large. What if the buffer memory usage is unacceptable even when item sizes <= data block size?
Issue | Resolution |
Disk backfills were hanging permanently due to high memory consumption when large documents were streamed over many DCP streams concurrently. | Memory for a document read by a DCP stream is now released before switching to another stream. |