Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-55026

ExpiryPager runs causing spikes in p100 SyncWrite latency

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Yes

    Description

      While measuring the latency of SyncWrites on modest node sizes (EC2 r5.2xlarge - 8 CPU cores), it was observed that there were periodic jumps in the worst-case (p100) SyncWrite latency every 10mins:

      Looking at tasks which run every 10mins, we can see a very direct correlation with when the ExpiryPager is scheduled to run (for the 7 buckets on this cluster):

      i.e. when the ExpiryPager starts to run for a bucket, the maximum SyncWrite latency suffers.

      This appears to be due to contention on the NonIO thread pool - on an 8-core system we create 2 nonIO threads, and the ExpiryPager runs 2 tasks per Bucket.

      Indeed, the latency increase is (almost) entirely eliminated if the number of NonIO threads is increased from 2 to 3 - so there's still a "spare" NonIO thread when the ExpiryPager tasks are running - note threads were changed at the dotted blue line:

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              owend Daniel Owen
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty