Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-56634

sysproc_cpu_utilization gauge stat appears incorrect (compared to sysproc_cpu_user+sys counters)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.6.0
    • 7.2.0
    • sigar
    • Linux, Capella

    Description

      As part of investigating elevated CPU on one out of 3 KV nodes in a load test (MB-56085), it was observed that the sigar-generated sysproc_cpu_utilization gauge statistic was inconsistent compared to the sigar-generated sysproc_cpu_user / sysproc_cpu_sys counter statistic.

      Examining the time period 2023-04-24 19:19:00 to 2023-04-24 19:29:00 (when the cluster was idle) shows quite different graphs:

      (Ignore the absolute values on the y axis, graphs are not correctly labelled).

      • "sysproc_cpu_utilization - memcached" defined as:

        sysproc_cpu_utilization{proc="memcached"}
        

      • "sysproc_cpu_user+sys - memcached" defined as:

        sum without (name) (rate(sysproc_cpu_user{proc="memcached"}[1m])) + sum without (name) (rate(sysproc_cpu_sys{proc="memcached"}[1m]))
        

      Note how according to the first graph, node 2 has ~7x higher CPU utilisation than the other two nodes, but according to the second graph, they all have similar CPU.

      Speaking to Trond Norbye, the difference between them is that for sysproc_cpu_utilization, sigar calculates the rate by comparing the previous sample, whereas sysproc_cpu_sys / user are just exposed as counters and hence sigar just reports the value direct from proc, leaving it up to downstream consumers (ns_server / prometheus) to perform rate calculations et al.

      One additional piece of corroborating evidence that sysproc_cpu_utilisation is the incorrect stat (and not sysproc_cpu_user / sys) is that Capella also runs https://github.com/ncabatoff/process-exporter - a Prometheus plugin which also reads data from /proc to report CPU usage - and that reports similar metrics as syproc_cpu_user / sys - i.e. all nodes are similar:

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-56634
          # Subject Branch Project Status CR V

          Activity

            People

              owend Daniel Owen
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty