Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.5.1, 6.6.0, 6.5.0
-
Untriaged
-
1
-
Yes
-
KV Sprint 2020-July
Description
Background
We have an issue where the customer is running cbstats timing and after sometime, it seems to get some kind of overflow for get_cmd, and that stat no longer shows, until the customer manually reset the stats.
Steps to reproduce
- Create a Couchbase Server cluster
- Created sample ephemeral bucket, though the customer hit this issue on a couchbase bucket
- run this pillowfight command to generate only GET operations with about ~ 230k ops/sec
/opt/couchbase/bin/cbc pillowfight -U "couchbase://localhost/emph" -u Administrator -P password -I 10000 -t 2 -r 0 -m 10 -M 10
Expected Result
We should see all the stats for `get_cmd`
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" get_cmd (1077612593 total) |
Actual Result
3) After a few hours, up until ~2.1 billion GETs, we observed the stats for get_cmd no longer showed
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
get_cmd (1077612593 total) |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
get_cmd (1501318447 total) |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
get_cmd (1501964517 total) |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
get_cmd (1508510808 total) |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
get_cmd (1508731255 total) |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
get_cmd (1508911340 total) |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
get_cmd (1509083148 total) |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
get_cmd (1509271515 total) |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
get_cmd (1561740278 total) |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
get_cmd (1764262595 total) |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" |
Analysis
This bug affects all of the histograms which are implemented using the HDRHistogram library, which is the vast majority of histograms reported by `cbstats` and also the `mctimings` histograms. As such, all those histograms will fail to report if their count exceeds 2^31.