Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • None
    • cmos
    • None

    Description

      The default scrape interval for CMOS Prometheus is 30s.  If you look at the local prometheus that ships with Couchbase it has a config similar to the following: 

      global:
        scrape_interval: '10s'
        scrape_timeout: '10s'
      rule_files:
        - '/opt/couchbase/var/lib/couchbase/config/prometheus_rules.yml'
      scrape_configs:
        - basic_auth:
            password_file: '/opt/couchbase/var/lib/couchbase/config/prometheus_token'
            username: '@prometheus'
          job_name: 'general'
          metric_relabel_configs:
            - source_labels:
                - '__name__'
              target_label: 'name'
          metrics_path: '/_prometheusMetrics'
          relabel_configs:
            - regex: '127\.0\.0\.1:8091'
              replacement: 'ns_server'
              source_labels:
                - '__address__'
              target_label: 'instance'
            - regex: '127\.0\.0\.1:9998'
              replacement: 'xdcr'
              source_labels:
                - '__address__'
              target_label: 'instance'
            - regex: '127\.0\.0\.1:8096'
              replacement: 'eventing'
              source_labels:
                - '__address__'
              target_label: 'instance'
            - regex: '127\.0\.0\.1:11280'
              replacement: 'kv'
              source_labels:
                - '__address__'
              target_label: 'instance'
          static_configs:
            - targets:
                - '127.0.0.1:8091'
                - '127.0.0.1:9998'
                - '127.0.0.1:8096'
                - '127.0.0.1:11280'
        - basic_auth:
          password_file: '/opt/couchbase/var/lib/couchbase/config/prometheus_token'
          username: '@prometheus'
          job_name: 'ns_server_high_cardinality'
          metric_relabel_configs:
            - source_labels:
                - '__name__'
              target_label: 'name'
          metrics_path: '/_prometheusMetricsHigh'
          relabel_configs:
            - regex: '127\.0\.0\.1:8091'
              replacement: 'ns_server'
              source_labels:
                - '__address__'
              target_label: 'instance'
          scrape_interval: '60s'
          scrape_timeout: '10s'
          static_configs:
            - targets:
                - '127.0.0.1:8091'    
        - basic_auth:
            password_file: '/opt/couchbase/var/lib/couchbase/config/prometheus_token'
            username: '@prometheus'
          job_name: 'kv_high_cardinality'
          metric_relabel_configs:
            - source_labels:
                - '__name__'
              target_label: 'name'
          metrics_path: '/_prometheusMetricsHigh'
          relabel_configs:
            - regex: '127\.0\.0\.1:11280'
              replacement: 'kv'
              source_labels:
                - '__address__'
              target_label: 'instance'
          scrape_interval: '180s'
          scrape_timeout: '10s'
          static_configs:
            - targets:
                - '127.0.0.1:11280'
        - basic_auth:
            password_file: '/opt/couchbase/var/lib/couchbase/config/prometheus_token'
            username: '@prometheus'
          job_name: 'eventing_high_cardinality'
          metric_relabel_configs:
            - source_labels:
                - '__name__'
              target_label: 'name'
          metrics_path: '/_prometheusMetricsHigh'
          relabel_configs:
            - regex: '127\.0\.0\.1:8096'
              replacement: 'eventing'
              source_labels:
                - '__address__'
              target_label: 'instance'
          scrape_interval: '180s'
          scrape_timeout: '10s'
          static_configs:
            - targets:
                - '127.0.0.1:8096'    

      While the default scrape interval is 10s, several of the high cardinality jobs actually have a 180s scrape interval.  As a typical standard, I've always recommended to my customers to scrape at a 1m scrape interval for all exporters.  As CMOS uses :8091/metrics and at this point we cannot differentiate end points to get only service specific metrics, I would recommend that we back off to 1m scrape interval instead of 30s.  

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Unassigned Unassigned
            aaron.benton Aaron Benton (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty