Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Done
Priority: Major
Fix Version/s: 6.5.0
Affects Version/s: 4.6.0
Component/s: couchbase-bucket
Labels:
- kv-goodness
- mad-hatter-committed

Sprint:
Mad-Hatter Code Complete

Description

Background

Currently the various timing histograms used by KV-Engine suffer when values are extremely small, or (generally more of a problem) when values are extremely large.

For example, the scheduler and runtimes histograms for ep-engine GlobalTask wait and runtimes max out at a "~17m -> Infinity" bucket - and while 17m is a long time there's a big difference between that and forever.

Similar issues exist with the mctimings output, where we show the timings of specific binary protocol commands. In addition to the "very large" results, we also have discontinuities due to us using a relatively naive bucketing - e.g. 10 microseconds, then every 1 millisecond:

[ 980 -  989]us ( 89.83%)   145 | #

[ 990 -  999]us ( 90.05%)   128 | #

[   1 -    1]ms ( 96.62%)  3905 | ############################################

[   2 -    2]ms ( 97.60%)   584 | ######

In addition to potentially being misleading (do many more operations take 1 milliseconds than take 990-999us?), it makes it harder to calculate percentiles - e.g. what it the 95th percentile above?

We should look to improve our timings:

Can we unify on a single histogram / timing implementation? (we currently have at least two, one in memcached & one in ep-engine)
Support a larger range of timings for GlobalTask scheduler and runtime histograms.
Support a more continuous range of timings for commands, so we can easily calculate 95th, 99.7th, ... percentiles)
Better export (e.g. rendering to a graph, import into other tools...)

Library to evaluate (others may be available):

HDR Histogram Edit as of Vulcan we are using this for HiFi_MRU so is already available in the build.
[Folly's Histogram and TimeseriesHistogram

Attachments

Issue Links

relates to

MB-33687 base_stats_collector crashing repeatedly

Closed

MB-33712 cbstats timings crashes

Closed

MB-40580 Fix rendering of percentiles from HdrHistogram

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-22005
#	Subject	Branch	Project	Status	CR	V
101080,2	MB-22005 Remove unused histogram generator	master	platform	Status: MERGED	+2	+1
101629,1	MB-22005 Remove unused UnsignedMicroseconds operator	master	kv_engine	Status: ABANDONED	-1	+1
101630,2	MB-22005 Remove unused methods and change method scope	master	kv_engine	Status: MERGED	+2	+1
101641,6	MB-22005 Update CMake files to link hdr_histogram lib for use in deamon	master	kv_engine	Status: MERGED	+2	+1
103400,19	MB-22005 Move from TimingHistogram to HdrHistogram	master	kv_engine	Status: MERGED	+2	+1
103489,15	MB-22005 Add Histogram benchmark tests	master	kv_engine	Status: MERGED	+2	+1
105330,11	MB-22005 Add parallel addValue HdrHistogram unit test	master	kv_engine	Status: MERGED	+2	+1
105381,3	MB-22005 Update manifests to use atomic version of HdrHistogram_c	master	manifest	Status: MERGED	+2	+1
105390,3	Revert "MB-22005 Update manifests to use atomic version of HdrHistogram_c"	master	manifest	Status: MERGED	+2	+1
105400,3	MB-22005 update manifests to use C++ compatible atomic HdrHistogram_c	master	manifest	Status: MERGED	+2	+1
105457,2	Revert "MB-22005 update manifests to use C++ compatible atomic HdrHistogram_c"	master	manifest	Status: MERGED	+2	+1
105634,2	MB-22005 update manifests to use atomic HdrHistogram_c (v3)	master	manifest	Status: MERGED	+2	+1
105650,3	MB-22005 Fix HdrHistogram memory leak on Linux address sanitizer	master	kv_engine	Status: MERGED	+2	+1
105654,2	MB-22005 Remove unused header <cstdlib>	master	kv_engine	Status: MERGED	+2	+1
105855,4	MB-22005 Fix ep-engine_sizes to return correct size of Histogram<>	master	kv_engine	Status: MERGED	+2	+1
105859,3	MB-22005 Add getMemFootPrint method to Histogram<>	master	platform	Status: MERGED	+2	+1
105919,6	Revert "MB-22005 Move from TimingHistogram to HdrHistogram"	master	kv_engine	Status: MERGED	+2	+1
106001,10	MB-22005 Move from TimingHistogram to HdrHistogram (recommit)	master	kv_engine	Status: MERGED	+2	+1
106004,14	MB-22005: Move from MicrosecondHistogram to HdrMicroSecHistogram	master	kv_engine	Status: MERGED	+2	+1
106513,2	MB-22005 Update manifestis to use cache aligned HdrHistogram_c	master	manifest	Status: MERGED	+2	+1
106718,5	MB-22005: Update cbstats to accept mean histogram data	master	kv_engine	Status: MERGED	+2	+1
106860,5	MB-22005: Make MicrosecondStopwatch templated	master	platform	Status: MERGED	+2	+1
107077,2	MB-22005 Update manifests to use latest version of HdrHistogram_c	master	manifest	Status: MERGED	+2	+1
107284,12	MB-22005: Move from Histogram<T> to data size HdrHistograms	master	kv_engine	Status: MERGED	+2	+1
107378,3	MB-22005,MB-33687: Revert bgWaitHisto to Histogram<T>	master	kv_engine	Status: MERGED	+2	+1
108569,2	MB-22005 Update HdrHistogram_c version to gain updated api	master	manifest	Status: MERGED	+2	+1
108578,8	MB-22005 Make HdrHistogram allocate using cb_calloc	master	kv_engine	Status: MERGED	+2	+1