Details
-
New Feature
-
Resolution: Unresolved
-
Critical
-
None
Description
Currently the only way to run the cbmultimanager checkers is to run the daemon and add a cluster to it, which requires setup and is slow, when it may be desirable to speed up diagnosis of a live cluster which is experiencing an issue but doesn't have CMOS deployed yet. Currently we need the customer to collect the logs, upload them, and then process them using the existing log ingest pipeline, which is slow.
One possible approach to this would be to run CMOS against a cbcollect (see CMOS-53 for some discussion). However, this would require a substantial rework of the stack, and would still require the operator to collect logs, which may take a while.
This ticket proposes an alternative approach: a static binary that connects to a given cluster, runs the health checks, prints a report, and exits. Eventually this could be something that a customer can download and run on a call, speeding up initial troubleshooting (in theory a report against a live cluster can be generated in under a minute, when a cbcollect_info alone may take much longer than that) - while also showing them the benefits of CMOS.
An extension could be to trigger a cbcollect, automatically load it from the cluster, and analyze the logs using the agent's log analyzer (Hazelnut). Since this would take significantly longer, this would be an optional usage mode.
The simplest way to implement this would likely be a separate Go binary / entry point into the cluster monitor, which takes a cluster's details and credentials as command line parameters and prints out a report.