Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49262

Checkpoint expel stops before low mark

    XMLWordPrintable

Details

    Description

      Introduced in http://review.couchbase.org/c/kv_engine/+/163330

      Expelling checks after every vbucket if further reduction in memory usage is required

          const auto vbuckets = bucket.getVBuckets().getVBucketsSortedByChkMgrMem();
          for (const auto& it : vbuckets) {
              const auto vbid = it.first;
              VBucketPtr vb = bucket.getVBucket(vbid);
              if (!vb) {
                  continue;
              }
       
              const auto expelResult =
                      vb->checkpointManager->expelUnreferencedCheckpointItems();
              EP_LOG_DEBUG(
                      "Expelled {} unreferenced checkpoint items "
                      "from {} "
                      "and estimated to have recovered {} bytes.",
                      expelResult.count,
                      vbid,
                      expelResult.memory);
       
              if (bucket.getRequiredCheckpointMemoryReduction() == 0) {
                  // All done
                  return ReductionRequired::No;
              }
          }
      

      size_t KVBucket::getRequiredCheckpointMemoryReduction() const {
          const auto checkpointMemoryRatio = getCheckpointMemoryRatio();
          const auto checkpointQuota = stats.getMaxDataSize() * checkpointMemoryRatio;
          const auto recoveryThreshold =
                  checkpointQuota * getCheckpointMemoryRecoveryUpperMark();
          const auto usage = stats.getCheckpointManagerEstimatedMemUsage();
       
          if (usage < recoveryThreshold) {
              return 0;
          }
       
          const auto lowerRatio = getCheckpointMemoryRecoveryLowerMark();
          const auto lowerMark = checkpointQuota * lowerRatio;
          Expects(usage > lowerMark);
          const size_t amountOfMemoryToClear = usage - lowerMark;
       
          const auto toMB = [](size_t bytes) { return bytes / (1024 * 1024); };
          const auto upperRatio = getCheckpointMemoryRecoveryUpperMark();
          EP_LOG_DEBUG(
                  "Triggering memory recovery as checkpoint memory usage ({} MB) "
                  "exceeds the upper_mark ({}, "
                  "{} MB) - total checkpoint quota {}, {} MB . Attempting to free {} "
                  "MB of memory.",
                  toMB(usage),
                  upperRatio,
                  toMB(checkpointQuota * upperRatio),
                  checkpointMemoryRatio,
                  toMB(checkpointQuota),
                  toMB(amountOfMemoryToClear));
       
          return amountOfMemoryToClear;
      }
      

      getRequiredCheckpointMemoryReduction boils down to:

      If checkpoint memory usage exceeds high mark:
       -> amount of memory to recover to reach the low mark
      else:
       -> 0
      

      Checking after every vbucket means expelling will often stop slightly below the high mark.

      Anecdotally, this has been seen in cluster run to lead to each run of the ClosedUnrefCheckpointRemoverTask expelling from a single vbucket, then ending. This leads to a lot of logging of:

      ClosedUnrefCheckpointRemoverTask:0 Triggering checkpoint memory recovery - attempting to free X MB
      

      and a reduced rate of expelling (as the task needs to be retriggered/scheduled between each vbucket).

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-49262
          # Subject Branch Project Status CR V

          Activity

            james.harrison James Harrison created issue -
            drigby Dave Rigby made changes -
            Field Original Value New Value
            Affects Version/s Neo [ 17615 ]
            owend Daniel Owen made changes -
            Rank Ranked higher
            owend Daniel Owen made changes -
            Rank Ranked higher
            owend Daniel Owen made changes -
            Rank Ranked lower
            owend Daniel Owen made changes -
            Rank Ranked lower
            owend Daniel Owen made changes -
            Triage Untriaged [ 10351 ] Triaged [ 10350 ]
            drigby Dave Rigby made changes -
            Link This issue relates to MB-49170 [ MB-49170 ]
            drigby Dave Rigby made changes -
            Rank Ranked higher
            drigby Dave Rigby made changes -
            Epic Link MB-38441 [ 123649 ]
            james.harrison James Harrison made changes -
            Assignee Daniel Owen [ owend ] James Harrison [ james.harrison ]
            james.harrison James Harrison made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            drigby Dave Rigby made changes -
            Is this a Regression? Unknown [ 10452 ] Yes [ 10450 ]
            james.harrison James Harrison made changes -
            Is this a Regression? Yes [ 10450 ] Unknown [ 10452 ]
            Sprint KV 2021-Nov [ 1866 ]
            drigby Dave Rigby made changes -
            Is this a Regression? Unknown [ 10452 ] Yes [ 10450 ]
            drigby Dave Rigby made changes -
            Description Introduced in http://review.couchbase.org/c/kv_engine/+/163330

            Expelling checks after every vbucket if further reduction in memory usage is required

            {noformat}
                const auto vbuckets = bucket.getVBuckets().getVBucketsSortedByChkMgrMem();
                for (const auto& it : vbuckets) {
                    const auto vbid = it.first;
                    VBucketPtr vb = bucket.getVBucket(vbid);
                    if (!vb) {
                        continue;
                    }

                    const auto expelResult =
                            vb->checkpointManager->expelUnreferencedCheckpointItems();
                    EP_LOG_DEBUG(
                            "Expelled {} unreferenced checkpoint items "
                            "from {} "
                            "and estimated to have recovered {} bytes.",
                            expelResult.count,
                            vbid,
                            expelResult.memory);

                    if (bucket.getRequiredCheckpointMemoryReduction() == 0) {
                        // All done
                        return ReductionRequired::No;
                    }
                }
            {noformat}

            {noformat}
            size_t KVBucket::getRequiredCheckpointMemoryReduction() const {
                const auto checkpointMemoryRatio = getCheckpointMemoryRatio();
                const auto checkpointQuota = stats.getMaxDataSize() * checkpointMemoryRatio;
                const auto recoveryThreshold =
                        checkpointQuota * getCheckpointMemoryRecoveryUpperMark();
                const auto usage = stats.getCheckpointManagerEstimatedMemUsage();

                if (usage < recoveryThreshold) {
                    return 0;
                }

                const auto lowerRatio = getCheckpointMemoryRecoveryLowerMark();
                const auto lowerMark = checkpointQuota * lowerRatio;
                Expects(usage > lowerMark);
                const size_t amountOfMemoryToClear = usage - lowerMark;

                const auto toMB = [](size_t bytes) { return bytes / (1024 * 1024); };
                const auto upperRatio = getCheckpointMemoryRecoveryUpperMark();
                EP_LOG_DEBUG(
                        "Triggering memory recovery as checkpoint memory usage ({} MB) "
                        "exceeds the upper_mark ({}, "
                        "{} MB) - total checkpoint quota {}, {} MB . Attempting to free {} "
                        "MB of memory.",
                        toMB(usage),
                        upperRatio,
                        toMB(checkpointQuota * upperRatio),
                        checkpointMemoryRatio,
                        toMB(checkpointQuota),
                        toMB(amountOfMemoryToClear));

                return amountOfMemoryToClear;
            }
            {noformat}

            {{getRequiredCheckpointMemoryReduction}} boils down to:

            {noformat}
            If checkpoint memory usage exceeds high mark:
             -> amount of memory to recover to reach the low mark
            else:
             -> 0
            {noformat}

            Checking after every vbucket means expelling will often stop slightly below the high mark.

            Anecdotally, this has been seen in cluster run to lead to each run of the {{ClosedUnrefCheckpointRemoverTask}} expelling from a single vbucket, then ending. This leads to a lot of logging of:

            {noformat}
            ClosedUnrefCheckpointRemoverTask:0 Triggering checkpoint memory recovery - attempting to free X MB
            {noformat}

            and a reduced rate of expelling (as the task needs to be retriggered/scheduled between each vbucket).
            Introduced in http://review.couchbase.org/c/kv_engine/+/163330

            Expelling checks after every vbucket if further reduction in memory usage is required

            {code:c++}
                const auto vbuckets = bucket.getVBuckets().getVBucketsSortedByChkMgrMem();
                for (const auto& it : vbuckets) {
                    const auto vbid = it.first;
                    VBucketPtr vb = bucket.getVBucket(vbid);
                    if (!vb) {
                        continue;
                    }

                    const auto expelResult =
                            vb->checkpointManager->expelUnreferencedCheckpointItems();
                    EP_LOG_DEBUG(
                            "Expelled {} unreferenced checkpoint items "
                            "from {} "
                            "and estimated to have recovered {} bytes.",
                            expelResult.count,
                            vbid,
                            expelResult.memory);

                    if (bucket.getRequiredCheckpointMemoryReduction() == 0) {
                        // All done
                        return ReductionRequired::No;
                    }
                }
            {code}

            {code:c++}
            size_t KVBucket::getRequiredCheckpointMemoryReduction() const {
                const auto checkpointMemoryRatio = getCheckpointMemoryRatio();
                const auto checkpointQuota = stats.getMaxDataSize() * checkpointMemoryRatio;
                const auto recoveryThreshold =
                        checkpointQuota * getCheckpointMemoryRecoveryUpperMark();
                const auto usage = stats.getCheckpointManagerEstimatedMemUsage();

                if (usage < recoveryThreshold) {
                    return 0;
                }

                const auto lowerRatio = getCheckpointMemoryRecoveryLowerMark();
                const auto lowerMark = checkpointQuota * lowerRatio;
                Expects(usage > lowerMark);
                const size_t amountOfMemoryToClear = usage - lowerMark;

                const auto toMB = [](size_t bytes) { return bytes / (1024 * 1024); };
                const auto upperRatio = getCheckpointMemoryRecoveryUpperMark();
                EP_LOG_DEBUG(
                        "Triggering memory recovery as checkpoint memory usage ({} MB) "
                        "exceeds the upper_mark ({}, "
                        "{} MB) - total checkpoint quota {}, {} MB . Attempting to free {} "
                        "MB of memory.",
                        toMB(usage),
                        upperRatio,
                        toMB(checkpointQuota * upperRatio),
                        checkpointMemoryRatio,
                        toMB(checkpointQuota),
                        toMB(amountOfMemoryToClear));

                return amountOfMemoryToClear;
            }
            {code}

            {{getRequiredCheckpointMemoryReduction}} boils down to:

            {noformat}
            If checkpoint memory usage exceeds high mark:
             -> amount of memory to recover to reach the low mark
            else:
             -> 0
            {noformat}

            Checking after every vbucket means expelling will often stop slightly below the high mark.

            Anecdotally, this has been seen in cluster run to lead to each run of the {{ClosedUnrefCheckpointRemoverTask}} expelling from a single vbucket, then ending. This leads to a lot of logging of:

            {noformat}
            ClosedUnrefCheckpointRemoverTask:0 Triggering checkpoint memory recovery - attempting to free X MB
            {noformat}

            and a reduced rate of expelling (as the task needs to be retriggered/scheduled between each vbucket).
            drigby Dave Rigby made changes -
            Rank Ranked lower
            owend Daniel Owen made changes -
            Rank Ranked lower
            james.harrison James Harrison made changes -
            VERIFICATION STEPS Verified by unit test.
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            ritam.sharma Ritam Sharma made changes -
            Labels request-dev-verify
            james.harrison James Harrison made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

            People

              james.harrison James Harrison
              james.harrison James Harrison
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty