Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.6.0
-
Untriaged
-
0
-
Unknown
Description
Similar to existing Prometheus metrics, add:
1. n1ql_timeouts
This should be a counter of requests that have reported a timeout. There are a few timeout errors that should be included here (check codes.go; ignore SHELL errors)
2. n1ql_mem_quota_exceeded_errors
This should be a counter of requests that have reported a quota (request or node) exceeded error. (E_MEMORY_QUOTA_EXCEEDED, E_NODE_QUOTA_EXCEEDED, E_TENANT_QUOTA_EXCEEDED, E_TRANSACTION_MEMORY_QUOTA_EXCEEDED)
3. n1ql_unauthorized_users
This should be a counter of requests that fail authorisation. (E_SERVICE_TENANT_NOT_AUTHORIZED, E_DATASTORE_AUTHORIZATION)
4. n1ql_bulk_get_errors
This should be a counter of times E_CB_BULK_GET was encountered.
5. n1ql_cas_mismatch_errors
This should be a counter of times E_CAS_MISMATCH was encountered.
6. n1ql_temp_space_errors
This should be a counter of times E_TEMP_FILE_QUOTA or E_GSI_TEMP_FILE_SIZE was encountered.
In all cases, increase each counter only once per request as appropriate.
Metrics can probably all be recorded by processing the requests errors under HttpEndpoint.doStats() and should be added as other counter metrics (see accounting/accounting.go).
Processing of the request's errors list should make use of a map to simplify future extensions to this. (Suggest map keyed on ErrorCode with the metric identifier as the payload.)
The metrics should noted in etc/metrics_metadata.json.
They should be visible in the Prometheus endpoint's (curl -su Administrator:password http://localhost:8093/_prometheusMetrics) output.