Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
Description
Both of these alerts have the same expression / PromQL query, the only difference is the value for the threshold. Alert Names do not have to be unique in AlertManager, and you do not want them to be unique for alerts of the same type, this is because in AlertManager you can inhibit rules of the same type that are already firing.
- alert: CB90055-metadataOverhead-Warning
|
expr: |
|
(kv_total_memory_overhead_bytes / kv_ep_max_size) > 0.5 < 0.9 |
for: 0m |
labels:
|
job: couchbase_prometheus
|
kind: bucket
|
health_check_id: CB90055
|
health_check_name: metadataOverhead
|
cluster: '{{ $labels.cluster }}' |
node: '{{ $labels.instance }}' |
bucket: '{{ $labels.bucket }}' |
severity: warning
|
annotations:
|
title: "Metadata Overhead Above 50% on Bucket: {{ $labels.bucket }}, Node: {{ $labels.instance }}" |
description: The percentage of memory that is taken up by metadata is over 50% |
remediation: Increase memory allocation for bucket or change the evictionPolicy of the bucket from `Value-only` (be aware this will have an adverse effect on performance). - alert: CB90055-metadataOverhead-Alert |
expr: |
|
(kv_total_memory_overhead_bytes / kv_ep_max_size) >= 0.9 |
for: 0m |
labels:
|
job: couchbase_prometheus
|
kind: bucket
|
health_check_id: CB90055
|
health_check_name: metadataOverhead
|
cluster: '{{ $labels.cluster }}' |
node: '{{ $labels.instance }}' |
bucket: '{{ $labels.bucket }}' |
severity: critical
|
annotations:
|
title: "Metadata Overhead Above 90% on Bucket: {{ $labels.bucket }}, Node: {{ $labels.instance }}" |
description: The percentage of memory that is taken up by metadata is over 90% |
remediation: Increase memory allocation for bucket or change the evictionPolicy of the bucket from `Value-only` (be aware this will have an adverse effect on performance). |
For example if both of these alerts were named "CB90055-metadataOverhead" and both alerts only had a single threshold not a between, we can setup the following generic rule in AlertManager
inhibit_rules:
|
- source_matchers:
|
- severity="critical" |
target_matchers:
|
- severity="warning" |
- severity="info" |
equal: [ alertname, cluster_name ]
|
This simply says if an alert comes in as a critical and an alert with the same name against the same cluster comes in with a severity of warning or info, simply ignore and silence the alert as there is one of a higher priority firing already.
This also makes changing thresholds easier, each rule can be independent and not have to deal with figuring an upper and lower bound.