Details
-
Task
-
Resolution: Unresolved
-
Major
-
7.6.0
-
None
-
0
Description
As observed with MB-59287, the ItemPager task can be woken in one of two ways:
- 1. If a memory condition is detected (mem_used > high_watermark).
- 2. Periodically, by default every 5s.
Method (1) should ideally be sufficient to trigger the ItemPager, but we only check on specific paths (e.g. checkAndMaybeFreeMemory), which currently does not include lookup-only methods. As such, for 100% read workloads which are <100% resident (and hence will need to BGFetch items into memory and hence increase mem_used), we rely on method (2) to trigger memory recovery via ItemPager.
This isn't ideal - there could be a delay of up to pager_sleep_time_ms (default 5s) between memory actually exceeding the high_watermark, and ItemPaging being run. On small bucket quotas / high bgFetch rates (or other non-mutation workloads which increase mem_used), this could result in memory spiking all the way to the tmpOOM threshold (93%) - or even hardOOM - (99%) and returning ENOMEM to client requests in this 5s window.
We should investigate how this can be improved - ideally we wouldn't have the periodic wakeup at all (see also: MB-36380, where the FlusherTask was changed to remove the periodic wakeup, and only wake when there was work to do).
Attachments
Issue Links
- relates to
-
MB-36380 Lost wakeup can delay Flusher up to 10s
- Closed