Fixed
Pinned fields
Click on the next to a field label to start pinning.
Details
Assignee
Ashwin GovindarajuluAshwin GovindarajuluReporter
Daniel OwenDaniel OwenIs this a Regression?
YesTriage
TriagedDue date
Aug 11, 2023Story Points
1Sprint
NonePriority
CriticalInstabug
Open Instabug
Details
Details
Assignee
Ashwin Govindarajulu
Ashwin GovindarajuluReporter
Daniel Owen
Daniel OwenIs this a Regression?
Yes
Triage
Triaged
Due date
Aug 11, 2023
Story Points
1
Sprint
None
Priority
Instabug
Open Instabug
PagerDuty
PagerDuty
PagerDuty
Sentry
Sentry
Sentry
Zendesk Support
Zendesk Support
Zendesk Support
Created May 13, 2020 at 3:41 PM
Updated September 20, 2023 at 5:45 AM
Resolved August 11, 2023 at 1:29 PM
Problem
There are multiple scenarios where replica checkpoints might allocate most of the memory on a node in a state where that memory is not releasable. That can result in hard OOM and possible deadlock in scenarios like rebalance or bulk load.
is an example of livelock at rebalance. That shows that without on-going mutations we can end-up with replica disk checkpoint being stuck in the open state, which means that we cannot recover all the memory associated with them.
While those scenarios are uncommon on on-premise envs, the system breaks quite quickly on many, small bucket envs if someone attempts simple loads with (eg) low memory quotas and bigger-than-usual doc sizes.
Original proposal
Due to the (current) invariant / assumption there’s always one open checkpoint - hence cannot close the last one (even though we have the last marker) as we don’t know what the seqnos for the next checkpoint are going to be.
If we relaxed that for replicas (which I think makes sense given they are essentially slaved to the active) then we could close the checkpoint as soon as the last mutation arrives - and hence remove that checkpoint once it’s unreferenced.
This only works for disk checkpoints as we need to know checkpoint ends not snap ends.
Final proposal
Force-closing the open checkpoint at replica comes with its own issues, see historical conversation in comments for details.
In the end we solve by allowing ItemExpel to remove all the mutations in checkpoints.
Note that, differently from the original proposal, the ItemExpel fix is wider-scoped and isn't restricted to Disk Checkpoints. So that improves our memory-recovery ability on Memory Checkpoints too and any similar issue caused by those.
Issue
Resolution
The last item in a replica checkpoint was not expelled. In scenarios such as large average item size, high numbers of replicas or low Bucket quota could result in a data-node entering an unrecoverable Out-of-Memory state.
ItemExpel has been enhanced to release all the items in a checkpoint when memory conditions allow.