Uploaded image for project: 'Couchbase Monitoring and Observability Stack'
  1. Couchbase Monitoring and Observability Stack
  2. CMOS-231

Dropped buckets show in the Data Service dashboard

    XMLWordPrintable

Details

    • Bug
    • Status: To Do
    • Major
    • Resolution: Unresolved
    • None
    • 1.0
    • cmos
    • None

    Description

      Because of the use of 

      label_values(multimanager_bucket_checker_status) 

      to list buckets, any dropped buckets are still listed as they will never be removed from `multimanager_*_checker_status`.

      This likely applies to any other panel that uses some variation of the checker status to list e.g., buckets, nodes, clusters, etc...

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          marks.polakovs Marks Polakovs added a comment - - edited

          The fundamental issue here is that using the "label_values" query variable definitions will query all time series that exist in Prometheus, which IIRC defaults to 15 days, meaning that if a bucket ever existed in that time it will be given as a candidate (similar for nodes, clusters...).

          I imagine we could switch to using "query_result" with $__range, to constrain it to the time range select in the dashboards, that way it'd only pick up time series that have existed in that time period. My concern is that this would increase load on Prometheus, because I imagine /api/v1/labels/$label/values is considerably faster than parsing and executing PromQL. There's also the issue that this may lead to confusing cases of nodes becoming unavailable to select, if e.g. that node goes down for whatever reason and Prometheus doesn't scrape it during the $__range period.

          There's no good solutions here, just multiple bad ones, and we need to pick the least bad of those.

          marks.polakovs Marks Polakovs added a comment - - edited The fundamental issue here is that using the "label_values" query variable definitions will query all time series that exist in Prometheus, which IIRC defaults to 15 days, meaning that if a bucket ever existed in that time it will be given as a candidate (similar for nodes, clusters...). I imagine we could switch to using "query_result" with $__range, to constrain it to the time range select in the dashboards, that way it'd only pick up time series that have existed in that time period. My concern is that this would increase load on Prometheus, because I imagine /api/v1/labels/$label/values is considerably faster than parsing and executing PromQL. There's also the issue that this may lead to confusing cases of nodes becoming unavailable to select, if e.g. that node goes down for whatever reason and Prometheus doesn't scrape it during the $__range period. There's no good solutions here, just multiple bad ones, and we need to pick the least bad of those.

          People

            Unassigned Unassigned
            douglas.sword Douglas
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty