Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31114

cbcollect robustness - retry if stats collections fail

    XMLWordPrintable

Details

    Description

      There are two improvements we wish to make when gathering stats as part of cbcollect:

      1. If we cannot connect to the server on port 11209, then retry on port 11210. Motivated by CBSE-5616, not the root cause but we saw that memcached was not listening on port 11210 so the inverse is plausible.
      2. If stats tasks fail then retry them up to 5 times (an arbitrary amount). Motivated by CBSE-5659, in which one of the worker threads hung. Stats tasks can be assigned to any worker thread so in the case of this CBSE many stats were missing as they were assigned to the hung worker. If a stats task timeouts then we should retry it in the hope that it is served by a different worker thread.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            dfinlay Dave Finlay
            ben.huddleston Ben Huddleston
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty