Details
-
Improvement
-
Resolution: Fixed
-
Major
-
7.6.0
-
0
Description
SRE team requires new functional metrics to be tracked for any failures during the following operations:
- Create/Update Index
- Delete Index
- Create Alias
- Delete Alias
- Query Index
- Query Alias
Alert Review reference: https://docs.google.com/spreadsheets/d/1aFueZsvxCC2If11OLD0j-x8ieHsvtxuxuOTXrO0aKVI/edit#gid=0&range=97:102
Goal: SRE can alert on any "Non-200 HTTP error code" from these operations
These are the stats we've published -
- total_create_index_request
- total_create_index_bad_request_error
- total_create_index_internal_server_error
- total_create_index_request_ok
- total_delete_index_request
- total_delete_index_bad_request_error
- total_delete_index_internal_server_error
- total_delete_index_request_ok
- total_queries_search_in_context_error
- total_queries_bad_request_error
- total_queries_consistency_error
- total_queries_max_result_window_exceeded_error
- total_queries_partial_results_error
with this FTS endpoints:
- GET /api/nsstats
- GET /_prometheusMetricsHigh