Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-22005

Use high dynamic range histograms in KV-Engine

    XMLWordPrintable

Details

    • Mad-Hatter Code Complete

    Description

      Background

      Currently the various timing histograms used by KV-Engine suffer when values are extremely small, or (generally more of a problem) when values are extremely large.

      For example, the scheduler and runtimes histograms for ep-engine GlobalTask wait and runtimes max out at a "~17m -> Infinity" bucket - and while 17m is a long time there's a big difference between that and forever.

      Similar issues exist with the mctimings output, where we show the timings of specific binary protocol commands. In addition to the "very large" results, we also have discontinuities due to us using a relatively naive bucketing - e.g. 10 microseconds, then every 1 millisecond:

      [ 980 -  989]us ( 89.83%)   145 | #
      [ 990 -  999]us ( 90.05%)   128 | #
      [   1 -    1]ms ( 96.62%)  3905 | ############################################
      [   2 -    2]ms ( 97.60%)   584 | ######
      

      In addition to potentially being misleading (do many more operations take 1 milliseconds than take 990-999us?), it makes it harder to calculate percentiles - e.g. what it the 95th percentile above?

      We should look to improve our timings:

      • Can we unify on a single histogram / timing implementation? (we currently have at least two, one in memcached & one in ep-engine)
      • Support a larger range of timings for GlobalTask scheduler and runtime histograms.
      • Support a more continuous range of timings for commands, so we can easily calculate 95th, 99.7th, ... percentiles)
      • Better export (e.g. rendering to a graph, import into other tools...)

      Library to evaluate (others may be available):

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-22005
          # Subject Branch Project Status CR V

          Activity

            People

              drigby Dave Rigby (Inactive)
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty