Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39300

Cap checkpoint memory usage to % of quota

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • 7.0.0
    • None
    • couchbase-bucket
    • None
    • Triaged
    • 1
    • No

    Description

      Currently it's possible to supply to a bucket so many mutations that it cannot keep up in terms of flushing and DCP. This becomes an issue when we start to approach the bucket quota/memory limits of the machines on which we run.

      In MB-38855 we have an example where we have very few vBuckets which, combined with probably under provisioned VMs causes us to get OOM killed. At the time of OOM kill we have a massive (proportionally) amount of memory in checkpoints. If we tried to flush (requires another O(n) amount of memory) then we can very easily blow the memory quota in this case. Whilst this case is contrived, an intermittenly slow disk could cause similar issues in a production use case.

      Another issue with the current approach can be seen in a swap rebalance. Checkpoint memory is transient, but during replica building can be significant, especially in high DGM cases. If we were to build a replica and allow checkpoint memory usage to grow without bounds (default 99% mem used - replication_throttle_threshold) then we can end up with very poor working sets (few HashTable items/low residency ratio).

      One tweak that would likely be required would be running checkpoint remover/expeller task when we hit this cap instead of any of those already existing or we might get "stuck". Probably needs linking to replication throttle in some way too.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ben.huddleston Ben Huddleston
              ben.huddleston Ben Huddleston
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty