Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-54810

Multiple HdrHistograms (including opcode, "scheduler" & "runtimes") histograms do not record durations more than 60s



    • Untriaged
    • 0
    • Yes
    • KV 2023-2


      The KV-Engine scheduler & runtimes histograms record the duration of those phases of task execution. We use Hdr1sfMicroSecHistogram for each of them, and this can record values between 0us to 60s with 1 s.f. precision.

      However, if a schedule or runtime exceeds 60s, it is not recorded - while hdr_histogram itself will indicate if a sample is out of range when added via hdr_record_value, we don't check the return value:

       * Records a value in the histogram, will round this value of to a precision at or better
       * than the significant_figure specified at construction time.
       * @param h "This" pointer
       * @param value Value to add to the histogram
       * @return false if the value is larger than the highest_trackable_value and can't be recorded,
       * true otherwise.
      bool hdr_record_value(struct hdr_histogram* h, int64_t value);
      bool Hdr1sfMicroSecHistogram::add(std::chrono::microseconds v, size_t count = 1) {
              return addValueAndCount(static_cast<uint64_t>(v.count()),
          // ! Histograms of various task run times, one per Task.
          std::vector<Hdr1sfMicroSecHistogram> taskRuntimeHisto;
      void KVBucket::logRunTime(TaskId taskType,
                                const std::chrono::steady_clock::duration runTime) {
          auto ms = std::chrono::duration_cast<std::chrono::microseconds>(runTime);

      Note the return value from Hdr1sfMicroSecHistogram::add() is ignored, so we don't know a value was omitted.

      I haven't checked other HdrHistograms, but it wouldn't surprise me if we have this problem elsewhere.

      While there's always a trade-off between what range / precision a histogram can track and the space it takes, I think we should at least add an "overflow" bucket (xxx -> inf) which allows us to see these samples have been omitted.

      Doing this "correctly" - i.e. the overflow bucket is part of the percentile calculation - is potentially tricky from outside hdr_histogram as it is the thing which calculates percentiles etc. We might have to look at modifying hdr_histogram to track the overflow bucket itself.


        For Gerrit Dashboard: MB-54810
        # Subject Branch Project Status CR V



            drigby Dave Rigby (Inactive)
            drigby Dave Rigby (Inactive)
            0 Vote for this issue
            3 Start watching this issue