Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-54810

Multiple HdrHistograms (including opcode, "scheduler" & "runtimes") histograms do not record durations more than 60s

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Yes
    • KV 2023-2

    Description

      The KV-Engine scheduler & runtimes histograms record the duration of those phases of task execution. We use Hdr1sfMicroSecHistogram for each of them, and this can record values between 0us to 60s with 1 s.f. precision.

      However, if a schedule or runtime exceeds 60s, it is not recorded - while hdr_histogram itself will indicate if a sample is out of range when added via hdr_record_value, we don't check the return value:

      /**
       * Records a value in the histogram, will round this value of to a precision at or better
       * than the significant_figure specified at construction time.
       *
       * @param h "This" pointer
       * @param value Value to add to the histogram
       * @return false if the value is larger than the highest_trackable_value and can't be recorded,
       * true otherwise.
       */
      bool hdr_record_value(struct hdr_histogram* h, int64_t value);
      ...
      bool Hdr1sfMicroSecHistogram::add(std::chrono::microseconds v, size_t count = 1) {
              return addValueAndCount(static_cast<uint64_t>(v.count()),
                                      static_cast<uint64_t>(count));
          }
      };
      ...
          // ! Histograms of various task run times, one per Task.
          std::vector<Hdr1sfMicroSecHistogram> taskRuntimeHisto;
      ...
      void KVBucket::logRunTime(TaskId taskType,
                                const std::chrono::steady_clock::duration runTime) {
          auto ms = std::chrono::duration_cast<std::chrono::microseconds>(runTime);
          stats.taskRuntimeHisto[static_cast<int>(taskType)].add(ms);
      }
      

      Note the return value from Hdr1sfMicroSecHistogram::add() is ignored, so we don't know a value was omitted.

      I haven't checked other HdrHistograms, but it wouldn't surprise me if we have this problem elsewhere.

      While there's always a trade-off between what range / precision a histogram can track and the space it takes, I think we should at least add an "overflow" bucket (xxx -> inf) which allows us to see these samples have been omitted.

      Doing this "correctly" - i.e. the overflow bucket is part of the percentile calculation - is potentially tricky from outside hdr_histogram as it is the thing which calculates percentiles etc. We might have to look at modifying hdr_histogram to track the overflow bucket itself.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            drigby Dave Rigby (Inactive)
            drigby Dave Rigby (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty