Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40967

All histograms implemented with HDRHistogram stop reporting when total samples exceeds 2^31

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Yes
    • KV Sprint 2020-July

    Description

      Background

       We have an issue where the customer is running cbstats timing and after sometime, it seems to get some kind of overflow for get_cmd, and that stat no longer shows, until the customer manually reset the stats.

       Steps to reproduce

      1. Create a Couchbase Server cluster
      2. Created sample ephemeral bucket, though the customer hit this issue on a couchbase bucket
      3. run this pillowfight command to generate only GET operations with about ~ 230k ops/sec

        /opt/couchbase/bin/cbc pillowfight -U "couchbase://localhost/emph" -u Administrator -P password -I 10000 -t 2 -r 0 -m 10 -M 10

      Expected Result

      We should see all the stats for `get_cmd` 

      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" get_cmd (1077612593 total)
      

      Actual Result

      3) After a few hours, up until ~2.1 billion GETs, we observed the stats for get_cmd no longer showed
       

      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
       get_cmd (1077612593 total) 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
       get_cmd (1501318447 total) 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
       get_cmd (1501964517 total) 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
       get_cmd (1508510808 total) 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
       get_cmd (1508731255 total) 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
       get_cmd (1508911340 total) 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
       get_cmd (1509083148 total) 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
       get_cmd (1509271515 total) 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
       get_cmd (1561740278 total) 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
       get_cmd (1764262595 total) 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd" 
      ubuntu@ip-172-31-87-80:/opt/couchbase/bin$ /opt/couchbase/bin/cbstats -u Administrator -p password -b emph localhost:11210 timings | grep "get_cmd"
      

      Analysis

      This bug affects all of the histograms which are implemented using the HDRHistogram library, which is the vast majority of histograms reported by `cbstats` and also the `mctimings` histograms. As such, all those histograms will fail to report if their count exceeds 2^31.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-40967
          # Subject Branch Project Status CR V

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              tin.tran Tin Tran (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty