Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Fixed
Priority: Major
Fix Version/s: 7.2.0
Affects Version/s: None
Component/s: sigar
Labels:
- approved-for-7.2.0

Story Points:
0

Description

Currently sigar provides only two cgroup memory stats: memory used and memory limit.

The problem is that we can't use "memory used" for alerting purposes because it includes cache (basically "memory used" is always pretty close to "memory limit"), regardless of what processes actually consume.

In order to have more realistic usage that can be used for alerting we need to subtract cache and buffers memory from total cgroup memory usage (that is what we do for host memory). In order to do that ns_server needs cgroup cache and buffers memory stats from sigar.

At first glance for cgroupv1 we need to get "memory cache" metric from memory.stat.

We need to choose "cache" or "total_cache" correctly though, because it should be consistent with the collected "memory used" stat.

Since sigar provides abstraction for cgroups for ns_server, determining the correct analogues for cgroupv2 is also part of this task.

It is ok if sigar returns "actual" memory used instead of separate cache metric (as it does for host metrics, basically do the subtraction on the sigar size), but seems like cache and rss are interesting memory stats by themselves, so it is probably worth it to report them separately so we have that information available for investigations. This is optional though.