Details
-
Task
-
Resolution: Done
-
Major
-
None
-
None
Description
We should flag up that a process has been OOM killed, as this is a common enough problem.
Possible approaches:
- node_exporter has node_vmstat_oom_kill
- agent that scans dmesg
In theory there's also the approach of scanning babysitter.log, but that's prone to false positives because OOM killer uses SIGKILL which can be caused by other things (still bad, but shouldn't be labelled as a "OOM kill" when it really isn't)
Attachments
Issue Links
- depends on
-
CMOS-210 System-level health checks
- In Progress