Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: cmos
Labels:

Description

Both of these alerts have the same expression / PromQL query, the only difference is the value for the threshold. Alert Names do not have to be unique in AlertManager, and you do not want them to be unique for alerts of the same type, this is because in AlertManager you can inhibit rules of the same type that are already firing.

- alert: CB90055-metadataOverhead-Warning

      expr: |

        (kv_total_memory_overhead_bytes / kv_ep_max_size) > 0.5 < 0.9

      for: 0m

      labels:

        job: couchbase_prometheus

        kind: bucket

        health_check_id: CB90055

        health_check_name: metadataOverhead

        cluster: '{{ $labels.cluster }}'

        node: '{{ $labels.instance }}'

        bucket: '{{ $labels.bucket }}'

        severity: warning

      annotations:

        title: "Metadata Overhead Above 50% on Bucket: {{ $labels.bucket }}, Node: {{ $labels.instance }}"

        description: The percentage of memory that is taken up by metadata is over 50%

        remediation: Increase memory allocation for bucket or change the evictionPolicy of the bucket from `Value-only` (be aware this will have an adverse effect on performance).    - alert: CB90055-metadataOverhead-Alert

      expr: |

        (kv_total_memory_overhead_bytes / kv_ep_max_size) >= 0.9

      for: 0m

      labels:

        job: couchbase_prometheus

        kind: bucket

        health_check_id: CB90055

        health_check_name: metadataOverhead

        cluster: '{{ $labels.cluster }}'

        node: '{{ $labels.instance }}'

        bucket: '{{ $labels.bucket }}'

        severity: critical

      annotations:

        title: "Metadata Overhead Above 90% on Bucket: {{ $labels.bucket }}, Node: {{ $labels.instance }}"

        description: The percentage of memory that is taken up by metadata is over 90%

        remediation: Increase memory allocation for bucket or change the evictionPolicy of the bucket from `Value-only` (be aware this will have an adverse effect on performance).

For example if both of these alerts were named "CB90055-metadataOverhead" and both alerts only had a single threshold not a between, we can setup the following generic rule in AlertManager

inhibit_rules:

- source_matchers:

  - severity="critical"

  target_matchers:

  - severity="warning"

  - severity="info"

  equal: [ alertname, cluster_name ]

This simply says if an alert comes in as a critical and an alert with the same name against the same cluster comes in with a severity of warning or info, simply ignore and silence the alert as there is one of a higher priority firing already.

This also makes changing thresholds easier, each rule can be independent and not have to deal with figuring an upper and lower bound.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Unassigned

Reporter:: Aaron Benton (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 16/Feb/22 11:18 AM

Updated:: 08/Apr/22 5:41 AM

Gerrit Reviews

There are no open Gerrit changes

Alerts for the same value and different thresholds should share the same name

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty