Description
Overview
Post 6.6.x cbbackupmgr creates stats files in the archive directory for useful debugging metrics:
- CPU utilization
- Memory
- Network
- Disk utilization
These will be created in ${ARCHIVE_PATH}/logs/stats under logical directory names (e.g. net/cpu). Each time you run a
command that collects stats, it generates a unix timestamp, and uses this in the filename for the stats file.
- That timestamp is logged ((Stats) Starting stat gathering - stat timestamp: %d)
- The same timestamp is used across different stats files, so can be used to group the stats for a given operation
As it stands, we don't have anything official which parses these files and converts them into a format that we can use when debugging CBSEs.
Task
Find a way to make these stats files useful when debugging; useful is somewhat subjective, so there's a few things that could be done.
- Write a Go program which parses them, and dumps averages etc
- Write a Go program which parses them, then converts them into a format which others can consume (e.g. grafana)
Personally, I'd break down what's required in both cases:
- A package which can parse stats files (into memory based structs in Go)
- A front end binary (in cmd/<name>/main.go) which uses the package then processes the information into something useful
- Another front end binary which converts the stats into something that can be consumed by grafana
We could then import the stats into grafana and create some pretty dashboards, for example
- Resource usage over time
- The relationship between CPU and disk usage e.g. low CPU, might be due to high disk utilization (blocked on IO)
These are the types of questions we usually have to answer on CBSEs when we're searching for the root cause of issues.