Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.5, 6.6.6, 7.0.4, 7.1.2, 7.2.0
-
Untriaged
-
0
-
Yes
-
KV 2023-2
Description
The KV-Engine scheduler & runtimes histograms record the duration of those phases of task execution. We use Hdr1sfMicroSecHistogram for each of them, and this can record values between 0us to 60s with 1 s.f. precision.
However, if a schedule or runtime exceeds 60s, it is not recorded - while hdr_histogram itself will indicate if a sample is out of range when added via hdr_record_value, we don't check the return value:
/**
|
* Records a value in the histogram, will round this value of to a precision at or better
|
* than the significant_figure specified at construction time.
|
*
|
* @param h "This" pointer
|
* @param value Value to add to the histogram
|
* @return false if the value is larger than the highest_trackable_value and can't be recorded,
|
* true otherwise.
|
*/
|
bool hdr_record_value(struct hdr_histogram* h, int64_t value); |
...
|
bool Hdr1sfMicroSecHistogram::add(std::chrono::microseconds v, size_t count = 1) { |
return addValueAndCount(static_cast<uint64_t>(v.count()), |
static_cast<uint64_t>(count)); |
}
|
};
|
...
|
// ! Histograms of various task run times, one per Task. |
std::vector<Hdr1sfMicroSecHistogram> taskRuntimeHisto;
|
...
|
void KVBucket::logRunTime(TaskId taskType, |
const std::chrono::steady_clock::duration runTime) { |
auto ms = std::chrono::duration_cast<std::chrono::microseconds>(runTime);
|
stats.taskRuntimeHisto[static_cast<int>(taskType)].add(ms); |
}
|
Note the return value from Hdr1sfMicroSecHistogram::add() is ignored, so we don't know a value was omitted.
I haven't checked other HdrHistograms, but it wouldn't surprise me if we have this problem elsewhere.
—
While there's always a trade-off between what range / precision a histogram can track and the space it takes, I think we should at least add an "overflow" bucket (xxx -> inf) which allows us to see these samples have been omitted.
Doing this "correctly" - i.e. the overflow bucket is part of the percentile calculation - is potentially tricky from outside hdr_histogram as it is the thing which calculates percentiles etc. We might have to look at modifying hdr_histogram to track the overflow bucket itself.
Attachments
For Gerrit Dashboard: MB-54810 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
186768,2 | MB-54810: Remove unused Hdr2sfMicroSecHistogram from sizes.cc | neo | kv_engine | Status: MERGED | +2 | +1 |
186769,8 | MB-54810: Remove unused Hdr2sfMicroSecHistogram | neo | platform | Status: MERGED | +2 | +1 |
186868,7 | MB-54810: HdrHistogram: track samples which overflow bucket range | neo | platform | Status: MERGED | +2 | +1 |
186886,5 | MB-54810: TimingsTest for values > max tracked | neo | kv_engine | Status: MERGED | +2 | +1 |
186887,5 | MB-54810: Remove unused Timings::generate() method | neo | kv_engine | Status: MERGED | +2 | +1 |
186889,6 | MB-54810: Include overflow samples in mctimings histograms | neo | kv_engine | Status: MERGED | +2 | +1 |
186947,9 | MB-54810: Include overflow samples in Prometheus histograms | neo | kv_engine | Status: MERGED | +2 | +1 |
186957,9 | MB-54810: Include overflow samples in cbstats histograms | neo | kv_engine | Status: MERGED | +2 | +1 |
187139,2 | MB-54810: HdrHistogram: track overflowed samples sum | neo | platform | Status: MERGED | +2 | +1 |
187140,3 | MB-54810: TimingsTest for overflowed_sum | neo | kv_engine | Status: MERGED | +2 | +1 |
190821,1 | Merge commit 'couchbase/neo~5' into master | master | platform | Status: ABANDONED | 0 | 0 |
190823,3 | Merge commit 'couchbase/neo~5' | master | platform | Status: ABANDONED | 0 | 0 |
191495,3 | Merge commit neo/99bf690ab into master | master | kv_engine | Status: MERGED | +2 | +1 |
191522,5 | Merge commit neo/6e8aec95d into master | master | kv_engine | Status: MERGED | +2 | +1 |
191753,3 | Merge commit neo/10b6ac8 into master | master | platform | Status: MERGED | +2 | +1 |
191792,1 | Merge commit neo/c1ca27a into master | master | platform | Status: MERGED | +2 | +1 |
191797,5 | Merge commit neo/99a149233 into master | master | kv_engine | Status: MERGED | +2 | +1 |