Description
What is the problem?
We try to collect a reasonable set of information when cbbackupmgr collects its logs (found in system_info.log). This is often enough to solve issues or point us in the right direction of the problem but occasionally a more difficult bug or support case appears.
For example we have cases where we suspect the disks are not performing as they should, but unfortunately we do not have a way to validate this without asking the customer to run some in-depth tests.
What is the solution?
We should add some way to run extra tests/benchmarks on a customer's machine. For example, collect-logs could take a --in-depth flag, or we could have a hidden problem-report subcommand to instruct users to run.
What tests shall we run?
Top of the head ideas:
- Write a known large random rift data file and work out the throughput
- Try to query AWS, GCP & Azure IMDS
- Allow the user to pass cluster information & try to ping/bootstrap/traceroute etc
- Add nfs stats
Prior art