Details
-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Scaling / HA
Description
This is a general epic to track the work needed to go from a single-container deployment of CMOS to a distributed, highly available system.
Many customers will desire high availability of their monitoring, because if it goes down they're effectively blind to what's happening in their cluster. Similarly, larger customers will have potentially tens if not hundreds of clusters across multiple data centres / tiers / applications, and they'll want a single pane of glass overview - which will scale beyond the capabilities of a single Prometheus.
IMO, these are two sides of the same coin: as we scale CMOS out we need to consider HA right from the get-go.