Details
-
Task
-
Resolution: Unresolved
-
Major
-
7.1.0
-
0
Description
Durable Ops are seeing increased adoption and we need to review the level of telemetry / diagnostics available to be able to support and debug issues.
This might involve the creation of new stats, or better visibility into the metrics we already have.
Situations we need to be able to diagnose quickly:
Rate of durable vs non-durable ops.
Number of failed durable ops (e.g. Replica busy / unavailable).
Performance of durable vs non-durable ops.
Proportion of time spent waiting for disk for persistence-based durable ops.