Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
7.1.0
-
Triaged
-
1
-
Yes
-
KV 2021-Nov
Description
Introduced in http://review.couchbase.org/c/kv_engine/+/163330
Expelling checks after every vbucket if further reduction in memory usage is required
const auto vbuckets = bucket.getVBuckets().getVBucketsSortedByChkMgrMem(); |
for (const auto& it : vbuckets) { |
const auto vbid = it.first; |
VBucketPtr vb = bucket.getVBucket(vbid);
|
if (!vb) { |
continue; |
}
|
|
const auto expelResult = |
vb->checkpointManager->expelUnreferencedCheckpointItems();
|
EP_LOG_DEBUG(
|
"Expelled {} unreferenced checkpoint items " |
"from {} " |
"and estimated to have recovered {} bytes.", |
expelResult.count,
|
vbid,
|
expelResult.memory);
|
|
if (bucket.getRequiredCheckpointMemoryReduction() == 0) { |
// All done |
return ReductionRequired::No; |
}
|
}
|
size_t KVBucket::getRequiredCheckpointMemoryReduction() const { |
const auto checkpointMemoryRatio = getCheckpointMemoryRatio(); |
const auto checkpointQuota = stats.getMaxDataSize() * checkpointMemoryRatio; |
const auto recoveryThreshold = |
checkpointQuota * getCheckpointMemoryRecoveryUpperMark();
|
const auto usage = stats.getCheckpointManagerEstimatedMemUsage(); |
|
if (usage < recoveryThreshold) { |
return 0; |
}
|
|
const auto lowerRatio = getCheckpointMemoryRecoveryLowerMark(); |
const auto lowerMark = checkpointQuota * lowerRatio; |
Expects(usage > lowerMark);
|
const size_t amountOfMemoryToClear = usage - lowerMark; |
|
const auto toMB = [](size_t bytes) { return bytes / (1024 * 1024); }; |
const auto upperRatio = getCheckpointMemoryRecoveryUpperMark(); |
EP_LOG_DEBUG(
|
"Triggering memory recovery as checkpoint memory usage ({} MB) " |
"exceeds the upper_mark ({}, " |
"{} MB) - total checkpoint quota {}, {} MB . Attempting to free {} " |
"MB of memory.", |
toMB(usage),
|
upperRatio,
|
toMB(checkpointQuota * upperRatio),
|
checkpointMemoryRatio,
|
toMB(checkpointQuota),
|
toMB(amountOfMemoryToClear));
|
|
return amountOfMemoryToClear; |
}
|
getRequiredCheckpointMemoryReduction boils down to:
If checkpoint memory usage exceeds high mark:
|
-> amount of memory to recover to reach the low mark
|
else:
|
-> 0
|
Checking after every vbucket means expelling will often stop slightly below the high mark.
Anecdotally, this has been seen in cluster run to lead to each run of the ClosedUnrefCheckpointRemoverTask expelling from a single vbucket, then ending. This leads to a lot of logging of:
ClosedUnrefCheckpointRemoverTask:0 Triggering checkpoint memory recovery - attempting to free X MB
|
and a reduced rate of expelling (as the task needs to be retriggered/scheduled between each vbucket).
Attachments
Issue Links
- relates to
-
MB-49170 Replica item count lagging active in Magma insert test
-
- Closed
-
Under heavy load, checkpoint memory usage may stay above the high mark during expelling; in that case this issue would have little impact.
Under low load CM mem usage would hover near the high mark. In that case, the load may be low enough that expelling only to the high mark is still sufficient to avoid backpressure.
This may be more problematic in buckets with very small quotas - there may be relatively little headroom beyond the high mark before the full CM quota is hit, leading to more frequent tmp fails than otherwise expected.
Secondarily, this issue will likely lead to faster wrapping of memcached.log due to CM mem usage more frequently reaching the high mark, logging Triggering checkpoint memory recovery each time; this has been noted in local repros.