[checkpoint] Allocation in replica checkpoints can push the system to hard OOM

Description

Problem

There are multiple scenarios where replica checkpoints might allocate most of the memory on a node in a state where that memory is not releasable. That can result in hard OOM and possible deadlock in scenarios like rebalance or bulk load.

is an example of livelock at rebalance. That shows that without on-going mutations we can end-up with replica disk checkpoint being stuck in the open state, which means that we cannot recover all the memory associated with them.

While those scenarios are uncommon on on-premise envs, the system breaks quite quickly on many, small bucket envs if someone attempts simple loads with (eg) low memory quotas and bigger-than-usual doc sizes.

Original proposal

Due to the (current) invariant / assumption there’s always one open checkpoint - hence cannot close the last one (even though we have the last marker) as we don’t know what the seqnos for the next checkpoint are going to be.

If we relaxed that for replicas (which I think makes sense given they are essentially slaved to the active) then we could close the checkpoint as soon as the last mutation arrives - and hence remove that checkpoint once it’s unreferenced.

This only works for disk checkpoints as we need to know checkpoint ends not snap ends.

Final proposal

Force-closing the open checkpoint at replica comes with its own issues, see historical conversation in comments for details.

In the end we solve by allowing ItemExpel to remove all the mutations in checkpoints.
Note that, differently from the original proposal, the ItemExpel fix is wider-scoped and isn't restricted to Disk Checkpoints. So that improves our memory-recovery ability on Memory Checkpoints too and any similar issue caused by those.

 

 

Issue

Resolution

The last item in a replica checkpoint was not expelled. In scenarios such as large average item size, high numbers of replicas or low Bucket quota could result in a data-node entering an unrecoverable Out-of-Memory state.

ItemExpel has been enhanced to release all the items in a checkpoint when memory conditions allow.

Components

Affects versions

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Attachments

4

Activity

Show:

CB robot September 20, 2023 at 5:45 AM

Build capella-analytics-1.0.0-1025 contains kv_engine commit 632b63d with commit message:
: Don't reuse touched-by-expel checkpoint in CM::createSnapshot

CB robot September 20, 2023 at 5:45 AM

Build capella-analytics-1.0.0-1025 contains kv_engine commit 39abd19 with commit message:
: Ensure no logic change in CM::getSnapshotInfo()

CB robot September 20, 2023 at 5:45 AM

Build capella-analytics-1.0.0-1025 contains kv_engine commit f4d1bab with commit message:
: Ensure no logic change in CM::getVisibleSnapshotEndSeqno()

CB robot September 19, 2023 at 10:43 AM

Build couchbase-server-8.0.0-1410 contains kv_engine commit 632b63d with commit message:
: Don't reuse touched-by-expel checkpoint in CM::createSnapshot

CB robot September 19, 2023 at 10:43 AM

Build couchbase-server-8.0.0-1410 contains kv_engine commit 39abd19 with commit message:
: Ensure no logic change in CM::getSnapshotInfo()

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Yes

Triage

Triaged

Due date

Story Points

Sprint

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created May 13, 2020 at 3:41 PM
Updated September 20, 2023 at 5:45 AM
Resolved August 11, 2023 at 1:29 PM
Instabug