Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-25632

Allow slow op threshold to be customised

    XMLWordPrintable

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.5.1, 5.0.0
    • 5.5.0
    • memcached

    Description

      During a slow GET investigation for a customer, it was useful to correlate our reporting of slow operations in the memcached log to their application reporting slow requests:

      2017-06-28T05:45:34.914126Z WARNING 485: Slow GET operation on connection: 578 ms ([ xxx.xxx.114.151:41590 - xxx.xxx.114.133:11210 ])
      

      However, the threshold in the customer environment differs the memcached threshold - they report ops as slow at 100ms, compared to the fixed threshold of 500ms in memcached. We should allow the threshold to be configured to match with different SLAs.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-25632
          # Subject Branch Project Status CR V

          Activity

            drigby Dave Rigby added a comment -

            We probably want this configurable via mcctl or similar (for easy tuning of single memcached process), and also via the memcached config file so ns_server can push down config to the whole cluster.

            drigby Dave Rigby added a comment - We probably want this configurable via mcctl or similar (for easy tuning of single memcached process), and also via the memcached config file so ns_server can push down config to the whole cluster.
            trond Trond Norbye added a comment -

            I guess we should drop a new config file at install time into `etc/couchbase/kv/opcode-attributes.json` with the something like the following syntax:

            {
              "version": 1,
              "default": {
                "slow": 500
              },
              "get": {
                "slow": 100
              },
              "compact_db": {
                "slow": "30 min"
              }
            }
            

            Where `version` indicates the format of the file, `default` is the entry we use unless there is an explicit entry for the the opcode and the rest is expected to be the opcode to modify. For now just we'll just add the attribute `slow` which indicates the threshold limit for assuming a value being slow. If the value is specified as a number it is the number of milliseconds. If specified as a string it may include a specifier such as us, ms, sec, min (, h, d, y ).

            memcached reads the default from this file (no hardcoded defaults anymore). After parsing that file it'll scan `etc/couchbase/kv/opcode-attributes.d` and apply each entry found there in alphabetical order. These files would normally contain a single entry. (this allows for ns_server to drop in one entry per command, and the user could even override that by creating `Z-final.json`

            A new config key (`opcode-attribute-file`) in `memcached.json` is needed in order to allow for easy unit testing. 

            To allow for easy checks we should create a new command to mcctl: `opcode-attributes`

             

            Ex:

            $ mcctl set opcode-attributes `{
              "version": 1,
              "default": {
                "slow": 500
              },
              "get": {
                "slow": 100
              },
              "compact_db": {
                "slow": "30 min"
              }
            }'

             

            One question is: Do we want to be able to be able to specify these on a per bucket level...

             

            trond Trond Norbye added a comment - I guess we should drop a new config file at install time into `etc/couchbase/kv/opcode-attributes.json` with the something like the following syntax: { "version": 1, "default": { "slow": 500 }, "get": { "slow": 100 }, "compact_db": { "slow": "30 min" } } Where `version` indicates the format of the file, `default` is the entry we use unless there is an explicit entry for the the opcode and the rest is expected to be the opcode to modify. For now just we'll just add the attribute `slow` which indicates the threshold limit for assuming a value being slow. If the value is specified as a number it is the number of milliseconds. If specified as a string it may include a specifier such as us, ms, sec, min (, h, d, y ). memcached reads the default from this file (no hardcoded defaults anymore). After parsing that file it'll scan `etc/couchbase/kv/opcode-attributes.d` and apply each entry found there in alphabetical order. These files would normally contain a single entry. (this allows for ns_server to drop in one entry per command, and the user could even override that by creating `Z-final.json` A new config key (`opcode-attribute-file`) in `memcached.json` is needed in order to allow for easy unit testing.  To allow for easy checks we should create a new command to mcctl: `opcode-attributes`   Ex: $ mcctl set opcode-attributes `{ "version": 1, "default": { "slow": 500 }, "get": { "slow": 100 }, "compact_db": { "slow": "30 min" } }'   One question is: Do we want to be able to be able to specify these on a per bucket level...  
            drigby Dave Rigby added a comment -

            One question is: Do we want to be able to be able to specify these on a per bucket level...

            I think initially, a global setting is fine.

            I'd also add that while the fine-grained config file proposal sounds good going forward, we probably want something quicker to implement for watson - say just a single threshold for the baseline (500ms) threshold.

            drigby Dave Rigby added a comment - One question is: Do we want to be able to be able to specify these on a per bucket level... I think initially, a global setting is fine. I'd also add that while the fine-grained config file proposal sounds good going forward, we probably want something quicker to implement for watson - say just a single threshold for the baseline (500ms) threshold.

            Build 5.1.0-1142 contains kv_engine commit f773f430c2c011912ebe5a41a55438b05c984f0d with commit message:
            MB-25632: Allow slow op threshold to be customised
            https://github.com/couchbase/kv_engine/commit/f773f430c2c011912ebe5a41a55438b05c984f0d

            build-team Couchbase Build Team added a comment - Build 5.1.0-1142 contains kv_engine commit f773f430c2c011912ebe5a41a55438b05c984f0d with commit message: MB-25632 : Allow slow op threshold to be customised https://github.com/couchbase/kv_engine/commit/f773f430c2c011912ebe5a41a55438b05c984f0d

            Hey Trond Norbye I saw the commit message with the syntax of the file. I want to make sure we have a clear set of instructions should anyone want to perform this. e.g.

            • Should we have a dial/eval that can propagate this change to all nodes consistently?
            • Does it require memcached to be restarted to pick up the change (and if so, could we make it dynamic?)
            • How about setting these values via cbepctl (or mcctl?)
            • Is this safe for an end-user to do (in which case we should probably document it) or is it something we would need support to hand-hold the customer through?
            dhaikney David Haikney added a comment - Hey Trond Norbye I saw the commit message with the syntax of the file. I want to make sure we have a clear set of instructions should anyone want to perform this. e.g. Should we have a dial/eval that can propagate this change to all nodes consistently? Does it require memcached to be restarted to pick up the change (and if so, could we make it dynamic?) How about setting these values via cbepctl (or mcctl?) Is this safe for an end-user to do (in which case we should probably document it) or is it something we would need support to hand-hold the customer through?
            trond Trond Norbye added a comment -

            David Haikney - In order to set it though dial/eval ns_server needs to add support for dumping it to memcached.json (there is currently no way to automatically have ns_server add sections into that file without explicitly adding support for each token. That must be handled by the ns_server team.

             

            memcached does not need to be restarted, and all changes take effect immediately. If one does specify a new default value, you may get some false entries in the log files during the reconfiguration (we first set all values to the new default, then override whatever is specified).

             

            New values may be specified through mcctl by using;

             

            mcctl -u Administrator -P password set sla '{ the new spec to set }'

             

            Everyone may set these values to whatever they want and it is completely safe to do, but you might spam you log files with warnings if you set the threshold too low for the underlying hw

             

            trond Trond Norbye added a comment - David Haikney - In order to set it though dial/eval ns_server needs to add support for dumping it to memcached.json (there is currently no way to automatically have ns_server add sections into that file without explicitly adding support for each token. That must be handled by the ns_server team.   memcached does not need to be restarted, and all changes take effect immediately. If one does specify a new default value, you may get some false entries in the log files during the reconfiguration (we first set all values to the new default, then override whatever is specified).   New values may be specified through mcctl by using;   mcctl -u Administrator -P password set sla '{ the new spec to set }'   Everyone may set these values to whatever they want and it is completely safe to do, but you might spam you log files with warnings if you set the threshold too low for the underlying hw  
            trond Trond Norbye added a comment -

            Filed MB-25860 to track changes needed in ns_server to distribute the values to all nodes

            trond Trond Norbye added a comment - Filed  MB-25860 to track changes needed in ns_server to distribute the values to all nodes

            Build couchbase-server-7.2.0-1029 contains kv_engine commit c86ff24 with commit message:
            MB-25632: Opcode attributes: document dynamic changing via mcctl

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1029 contains kv_engine commit c86ff24 with commit message: MB-25632 : Opcode attributes: document dynamic changing via mcctl

            People

              trond Trond Norbye
              drigby Dave Rigby
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty