[BP 7.2.0] Expose aggregated node level stats via prometheus

Description

Need avg_resident_percent stat for generation of alerts. See

Following aggregate node level stats are getting added

num_indexes
num_storage_instances
avg_resident_percent
avg_mutation_rate
avg_drain_rate
avg_disk_bps
total_data_size
total_disk_size
memory_used_storage
memory_total_storage
total_requests
total_rows_returned
total_rows_scanned

Please note that these stats are already getting reported in log files. With this change, same information will be reported via prometheus rest endpoint as well.

Components

Affects versions

Fix versions

Labels

Environment

None

Release Notes Description

None

Activity

Show:

Chris Malarky February 22, 2023 at 2:45 PM

Hi As discussed, we can meet up next week to have a session with the support team (CC / ).

Amit Kulkarni February 21, 2023 at 2:52 PM

Hi , I have opened to remove extra stats. Hoping to see quick improvement on mortimer tooling side.

Unfortunately, moving away from indexer_stats.log is not a simple. Currently prometheus is configured to vary the stat collection frequency for high cardinality stats. That does not work well when debugging indexing CBSEs. If it was simple, we would have taken it up in 7.0 timeframe.

Chris Malarky February 21, 2023 at 1:19 PM

Hi , I am again requesting that you review the above stats and do not add any new aggregated metrics to Prometheus if they can be derived from existing ones.

If you don't remove them now then they will most likely be deprecated when I do a metrics review later this year, causing disruption to anyone who has started to rely on them.

CC /

Yash Dodderi February 21, 2023 at 12:27 PM

Repro steps->

  1. Create a 2 node cluster with n1ql,kv and index services

  2. Load some data and create some indexes and run queries on them

  3. using the end point to verify the stats curl -u Administrator:password http://<your_ip>:8091/_prometheus/api/v1/query?query=index_total_rows_scanned

Verify for all the below stats-

num_indexes
num_storage_instances
avg_resident_percent
avg_mutation_rate
avg_drain_rate
avg_disk_bps
total_data_size
total_disk_size
memory_used_storage
memory_total_storage
total_requests
total_rows_returned
total_rows_scanned

Verified on 7.2.0 5195 

Closing the ticket

CB robot February 10, 2023 at 1:30 PM

Build couchbase-server-7.2.0-5156 contains indexing commit aaa1143 with commit message:
: Expose aggregated node level stats via prometheus

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Due date

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created January 23, 2023 at 5:46 PM
Updated February 22, 2023 at 2:45 PM
Resolved February 10, 2023 at 3:03 PM
Instabug