Uploaded image for project: 'Couchbase Monitoring and Observability Stack'
  1. Couchbase Monitoring and Observability Stack
  2. CMOS-23

How-To: tune and provide customisation

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Done
    • Major
    • 0.1
    • None
    • cmos, documentation
    • None

    Description

      Migrated from https://github.com/couchbaselabs/observability/issues/7. 

      Demonstrate how someone could tune alerts and reporting, dashboards, etc. for a specific deployment.

      An example for the microlith is in place to show how to use custom Prometheus alerts with defaults: https://github.com/couchbaselabs/observability/tree/06b40dd3d36e743521a1d9bd76b73a895d5fca78

      This supports combining default and custom rules provided at runtime to Prometheus.

      • /etc/prometheus/alerting
        • couchbase <-- default rules
        • custom <-- empty by default, add custom rules here

      This allows us to completely override all the defaults or just extend them by mounting to these directories from the host, a config map, volume, etc.

      We can even go into only overriding certain defaults in files if needs be: this should encourage a granular file-based break down of rules into separate files to make it easier to target specific ones.

      Rather than overcomplicate things, for now provide substitution via a template file for tuning with an example for how to disable a rule by overriding the file (removing the rule from it).

      This should then support the following:

      1. Default rules out of the box from Couchbase.
      2. Provide customer-specific rules on top of the defaults. Just mount them in via volumes, config maps, etc.
      3. Support tuning of any of these rules from defaults and custom by environment variable. Just provide the variables.
      4. Disable rules by overriding them - remove from the file providing them in the defaults. Again mount them in.

      If we decide we want more than this (and we may want to pick it up anyway), we can adopt an approach like: https://github.com/lablabs/prometheus-alert-overrider
      Pre-process all files according to a known format. That tool essentially uses another YAML override file to match with any rules found to then update them.

      Tuning is available via environment variables with an example default rule for resident ratio:

      expr: cbbucketstat_vbuckets_active_resident_items_ratio > $COUCHBASE_ACTIVE_RESIDENT_RATIO_ALERT_THRESHOLD
      for: $COUCHBASE_ACTIVE_RESIDENT_RATIO_ALERT_DURATION  

      At startup we default them in the entrypoint for Prometheus:

      export COUCHBASE_ACTIVE_RESIDENT_RATIO_ALERT_THRESHOLD=${COUCHBASE_ACTIVE_RESIDENT_RATIO_ALERT_THRESHOLD:-100}
      export COUCHBASE_ACTIVE_RESIDENT_RATIO_ALERT_DURATION=${COUCHBASE_ACTIVE_RESIDENT_RATIO_ALERT_DURATION:-1m}  

       

      For the example we override the default:

       - COUCHBASE_ACTIVE_RESIDENT_RATIO_ALERT_THRESHOLD=75 # default is 100  

      If we want anything more complex than this we need to consider an approach of pre-processing similar to the linked examples.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              anil Anil Kumar (Inactive)
              marks.polakovs Marks Polakovs (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty