Uploaded image for project: 'Couchbase Monitoring and Observability Stack'
  1. Couchbase Monitoring and Observability Stack
  2. CMOS-254

Alert on CB processes getting OOM killed

    XMLWordPrintable

Details

    • Task
    • Status: Done
    • Major
    • Resolution: Done
    • None
    • 1.0
    • cluster-monitor, cmos
    • None

    Description

      We should flag up that a process has been OOM killed, as this is a common enough problem.

      Possible approaches:

      1. node_exporter has node_vmstat_oom_kill
      2. agent that scans dmesg

      In theory there's also the approach of scanning babysitter.log, but that's prone to false positives because OOM killer uses SIGKILL which can be caused by other things (still bad, but shouldn't be labelled as a "OOM kill" when it really isn't)

      Attachments

        Issue Links

          For Gerrit Dashboard: CMOS-254
          # Subject Branch Project Status CR V

          Activity

            Build couchbase-cluster-monitor-0.2.0-216 contains cbmultimanager commit 0bd645a with commit message:
            CMOS-210, CMOS-254 Add analyser for dmesg + OOM kill check

            build-team Couchbase Build Team added a comment - Build couchbase-cluster-monitor-0.2.0-216 contains cbmultimanager commit 0bd645a with commit message: CMOS-210 , CMOS-254 Add analyser for dmesg + OOM kill check

            People

              marks.polakovs Marks Polakovs (Inactive)
              marks.polakovs Marks Polakovs (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty