Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60818

Misleading "Hard OOM" alerts on WebUI

    XMLWordPrintable

Details

    • 0

    Description

      ns_server uses the ep_temp_oom/ep_oom counters for differentiating between recoverable (aka Temporary) and unrecoverable (aka Hard) OOM states.
      The Hard side of the thing is just a mis-interpretation.
      In memcached we have:

      • ep_temp_oom: Memory pressure (defined by some internal thresholds), but memory still below the Bucket Quota
      • ep_oom: Memory has reached the Bucket Quota

      That has nothing to do with whether the OOM state is recoverable or unrecoverable. Actually, the logic in memcached always tries to recover from OOM state, so there's no really concept of "unrecoverable OOM" in memcached except for specific scenarios like value-eviction + doc metadata eats the entire quota, or ephemeral/fail_new_data buckets.

      The mis-interpretation above generates misleading "unrecoverable OOM" alerts.

      Could we improve that?
      Thanks

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ankush.sharma Ankush Sharma
              paolo.cocchi Paolo Cocchi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty