Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-20482

dcp observe replication regression between 4.5.1-2801 and 4.5.1-2802

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • Yes

    Description

      This has been identified as part of the daily sanity:
      python -u perfSanity/scripts/perf_regression_runner_alpha.py -e -v 4.5.1-2802 -r 2016-07-14:13:18 -q "testName='dcp_observe'" -n -e

      This test measures the replication latency. With build 2801 the 95% percentile replica is 5 msec, and in build 2802 it is 800 msec. There are a number of epengine related changes that went into this build.

      This issue is not seen in the weekly crank, possibly it is showing up with the daily sanity because the servers have less RAM and fewer cores (though is a supported configuration).

      Initially assigning to Jim because he has worked on similar issues in the past.

      Jim, if this does indeed belong to you let know a good time to meet to discuss this.

      I will attached logs shortly.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          The following playbook is used to initialize test machines (Ubuntu 16.04 is usually used):

          https://github.com/couchbase/perfrunner/blob/master/playbooks/clients.yml

          Obviously, tools such as htop are installed for debugging purposes.

          "make build" creates virtual environment and installs all required python packages.

          I believe the same process should work for "perfSanity" as well.

          pavelpaulau Pavel Paulau (Inactive) added a comment - The following playbook is used to initialize test machines (Ubuntu 16.04 is usually used): https://github.com/couchbase/perfrunner/blob/master/playbooks/clients.yml Obviously, tools such as htop are installed for debugging purposes. "make build" creates virtual environment and installs all required python packages. I believe the same process should work for "perfSanity" as well.
          pavelpaulau Pavel Paulau (Inactive) added a comment - - edited

          If I understand correctly, this test configuration is used to reproduce the regression:

          https://github.com/couchbase/perfrunner/blob/master/perfSanity/tests/kv_observe_4M_repl.test

          It looks like the test attempts to perform 40K SETs/sec:

          reads = 50
          updates = 50
          throughput = 80000
          

          The throughput is quite high for a small setup though.

          Nevertheless, I tried to run this configuration/workload using one of our "weekly" clusters.

          I can see that persistoTo=1 latency is consistently less than 5ms in both 4.1.1-5914 and 4.1.2-6087.

          Neither I can see different CPU utilization. It's 600% in both cases.

          Please note, Eric's servers have only 4 relatively slow cores.

          pavelpaulau Pavel Paulau (Inactive) added a comment - - edited If I understand correctly, this test configuration is used to reproduce the regression: https://github.com/couchbase/perfrunner/blob/master/perfSanity/tests/kv_observe_4M_repl.test It looks like the test attempts to perform 40K SETs/sec: reads = 50 updates = 50 throughput = 80000 The throughput is quite high for a small setup though. Nevertheless, I tried to run this configuration/workload using one of our "weekly" clusters. I can see that persistoTo=1 latency is consistently less than 5ms in both 4.1.1-5914 and 4.1.2-6087. Neither I can see different CPU utilization. It's 600% in both cases. Please note, Eric's servers have only 4 relatively slow cores.
          jwalker Jim Walker added a comment - - edited

          This issue is just the way performance lands with the fair scheduling fix of MB-18453.

          As you may know ep-engine has a multi-threaded tasking model. A fixed number of threads are created and assigned to one of 4 task types (reader/writer/nonio/auxio) and then tasks can be created and scheduled to run.

          We’ve updated our documents for the tasking model in the ep-engine README.md, which has more details.

          With MB-18453 we addressed a problem where the scheduler "wake" function scheduled the task straight into the "readyQueue". The “readyQueue” is the queue that running threads get work from (they pop the queue), however the “readyQueue” is ordered by a tasks priority. So when enqueueing a high priority task to the “readyQueue” it goes before any lower priority tasks.

          This was the trigger of the now well known and commonly seen “NONIO task waiting” problem that’s caused many rebalances to fail. In those instances typically two tasks (ConnNotifier and Processor) that are associated with DCP get woken (via the broken wake function) as traffic arrives on a node. The DCP tasks are high-priority (e.g. Processor has the highest priority available) and jump ahead of other tasks. Every mutation landing on the node via DCP could be causing these tasks to jump to the front of the queue and the result is that some low priority tasks, critical to rebalance are held in the queue for a long time. In fact they've been seen to be held up for hours.

          The fix added in MB-18453 was to never enque tasks directly into the “readyQueue”, we always enqueu them into the “futureQueue” which is ordered by the time the task should execute. The worker threads which are draining the “readyQueue” only ever re-fill their “readyQueue” when their “readyQueue” is empty, that's when they look at the "futureQueue" and move all tasks which need to be executed. Thus we never get into the starvation problem as everyone gets a fair go.

          But…

          In this fairer world sometimes DCP has to wait its turn and thus the 95th percentile of observe has gone up and this performance change is more noticeable on low-core count systems. In systems with more cores, if say the NONIO tasks are very busy (because DCP is running hard), we’ll are able to drain the queues faster as the NONIO threads can all be on-CPU concurrently and hence the impact of the fairer scheduling is less obvious.

          I hope this makes sense, any questions welcome.

          jwalker Jim Walker added a comment - - edited This issue is just the way performance lands with the fair scheduling fix of MB-18453 . As you may know ep-engine has a multi-threaded tasking model. A fixed number of threads are created and assigned to one of 4 task types (reader/writer/nonio/auxio) and then tasks can be created and scheduled to run. We’ve updated our documents for the tasking model in the ep-engine README.md, which has more details. https://github.com/couchbase/ep-engine/README.md With MB-18453 we addressed a problem where the scheduler "wake" function scheduled the task straight into the "readyQueue". The “readyQueue” is the queue that running threads get work from (they pop the queue), however the “readyQueue” is ordered by a tasks priority. So when enqueueing a high priority task to the “readyQueue” it goes before any lower priority tasks. This was the trigger of the now well known and commonly seen “NONIO task waiting” problem that’s caused many rebalances to fail. In those instances typically two tasks (ConnNotifier and Processor) that are associated with DCP get woken (via the broken wake function) as traffic arrives on a node. The DCP tasks are high-priority (e.g. Processor has the highest priority available) and jump ahead of other tasks. Every mutation landing on the node via DCP could be causing these tasks to jump to the front of the queue and the result is that some low priority tasks, critical to rebalance are held in the queue for a long time. In fact they've been seen to be held up for hours. The fix added in MB-18453 was to never enque tasks directly into the “readyQueue”, we always enqueu them into the “futureQueue” which is ordered by the time the task should execute. The worker threads which are draining the “readyQueue” only ever re-fill their “readyQueue” when their “readyQueue” is empty, that's when they look at the "futureQueue" and move all tasks which need to be executed. Thus we never get into the starvation problem as everyone gets a fair go. But… In this fairer world sometimes DCP has to wait its turn and thus the 95th percentile of observe has gone up and this performance change is more noticeable on low-core count systems. In systems with more cores, if say the NONIO tasks are very busy (because DCP is running hard), we’ll are able to drain the queues faster as the NONIO threads can all be on-CPU concurrently and hence the impact of the fairer scheduling is less obvious. I hope this makes sense, any questions welcome.

          Updating the test case with the new expected value.

          ericcooper Eric Cooper (Inactive) added a comment - Updating the test case with the new expected value.

          This test was generated by me generally in isolation and it is possible it does not represent a real world scenario.

          I see two possible conclusions:

          • this is not a real world scenario (and as this test was created by me very much in isolation this is a possibility)
          • this is a real world scenario but the impact is such that the customer would not notice it
          ericcooper Eric Cooper (Inactive) added a comment - This test was generated by me generally in isolation and it is possible it does not represent a real world scenario. I see two possible conclusions: this is not a real world scenario (and as this test was created by me very much in isolation this is a possibility) this is a real world scenario but the impact is such that the customer would not notice it

          People

            ericcooper Eric Cooper (Inactive)
            ericcooper Eric Cooper (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty