Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39618

Memcached is CPU hungry when HPET clock source used

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.5.0, 6.5.1
    • Fix Version/s: 6.6.0
    • Component/s: couchbase-bucket
    • Environment:
      Inside Docker, macOS as a host, using HPET clock source for Docker VM.
    • Story Points:
      1

      Description

      This is using our official docker image which is built on top of ubuntu 16.04 LTS.

      A user has reported this in https://github.com/testcontainers/testcontainers-java/issues/2802 and I've been able to reproduce this locally.

      If you create 3-4 buckets, memcached cpu goes up to 100-150% inside the container. This happens with 6.5.0 and 6.5.1. I also tried with 6.0.3 and there the CPU is at 13%.

      I triaged a bit with Dave Rigby and the current suspicion is that the sync durability monitor is consuming way more cpu in this setup than tuned for non-virtual linux hosts (less than 1%).. a couple % are probably fine, but this is keeping 1-2 cores busy all the time, even if there is no ops/traffic going through.

      If you are not able to reproduce this I can provide the java program to run it, but all testcontainers-java does is spin up the docker container, configure it with alternate addr to expose the ports and then create 3 buckets.


      Some more info from the initial triage:

      top output:

        329 couchba+  20   0 2038928 105812  12564 S 108.7  5.2   7:18.83 memcached
      

      points to the NonIO threads:

      498 couchba+  20   0 2038928 104720  12464 S 51.8  5.1   2:14.51 mc:nonIO_1
      497 couchba+  20   0 2038928 104720  12464 S 49.2  5.1   2:12.78 mc:nonIO_0
      

      mpstat shows we're spending some time in sys:

      09:14:34     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
      09:14:35     all    3.98    0.00   19.38    0.00    0.00    0.00    0.00    0.00    0.00   76.64
      09:14:35       0    3.16    0.00   11.58    0.00    0.00    0.00    0.00    0.00    0.00   85.26
      09:14:35       1    0.00    0.00    2.13    0.00    0.00    0.00    0.00    0.00    0.00   97.87
      09:14:35       2    1.04    0.00    6.25    0.00    0.00    0.00    0.00    0.00    0.00   92.71
      09:14:35       3    6.25    0.00   40.62    0.00    0.00    0.00    0.00    0.00    0.00   53.12
      09:14:35       4    7.29    0.00   37.50    0.00    0.00    0.00    0.00    0.00    0.00   55.21
      09:14:35       5    5.15    0.00   16.49    0.00    0.00    0.00    0.00    0.00    0.00   78.35
      

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            drigby Dave Rigby added a comment - - edited

            Fix up on gerrit: http://review.couchbase.org/c/kv_engine/+/129936. Ended up taking a slightly different (and simpler approach) - use CLOCK_MONOTONIC_COARSE instead of CLOCK_MONOTONIC for DurabilityTimeoutTask's checking when to pause. This clock only has 1ms resolution, but crucially can always be handled in the userspace VDSO for current Linux kernels[1]. Given the duration we are trying to measure is 25ms, a 1ms resolution clock is perfectly adequate.

            Idle CPU on local test machine (mancouch) with 5 empty buckets is back to the same numbers seen with the tsc clock source:

            • clocksource=tsc

                PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
               
              31389 daver     20   0 1785744 134848  13268 S  6.8  0.1   0:21.41 mc:nonIO_1
              31388 daver     20   0 1785744 134848  13268 S  6.4  0.1   0:21.23 mc:nonIO_0
              

            • clocksource=hpet

                PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
               
              31389 daver     20   0 1785744 135992  13268 S 20.0  0.1   0:37.37 mc:nonIO_1
              31388 daver     20   0 1785744 135992  13268 S 19.4  0.1   0:36.84 mc:nonIO_0
              

            • clocksource=hpet, folly::chrono::coarse_steady_clock

                PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
               
              40603 daver     20   0 1748892  86508  13268 S  6.3  0.1   0:01.27 mc:nonIO_0
              40604 daver     20   0 1748892  86508  13268 S  5.3  0.1   0:01.06 mc:nonIO_1
              

            [1]: https://elixir.bootlin.com/linux/v4.19.76/source/arch/x86/entry/vdso/vclock_gettime.c#L282

            Show
            drigby Dave Rigby added a comment - - edited Fix up on gerrit: http://review.couchbase.org/c/kv_engine/+/129936 . Ended up taking a slightly different (and simpler approach) - use CLOCK_MONOTONIC_COARSE instead of CLOCK_MONOTONIC for DurabilityTimeoutTask 's checking when to pause. This clock only has 1ms resolution, but crucially can always be handled in the userspace VDSO for current Linux kernels [1] . Given the duration we are trying to measure is 25ms, a 1ms resolution clock is perfectly adequate. Idle CPU on local test machine (mancouch) with 5 empty buckets is back to the same numbers seen with the tsc clock source: clocksource=tsc PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND   31389 daver 20 0 1785744 134848 13268 S 6.8 0.1 0:21.41 mc:nonIO_1 31388 daver 20 0 1785744 134848 13268 S 6.4 0.1 0:21.23 mc:nonIO_0 clocksource=hpet PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND   31389 daver 20 0 1785744 135992 13268 S 20.0 0.1 0:37.37 mc:nonIO_1 31388 daver 20 0 1785744 135992 13268 S 19.4 0.1 0:36.84 mc:nonIO_0 clocksource=hpet, folly::chrono::coarse_steady_clock PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND   40603 daver 20 0 1748892 86508 13268 S 6.3 0.1 0:01.27 mc:nonIO_0 40604 daver 20 0 1748892 86508 13268 S 5.3 0.1 0:01.06 mc:nonIO_1 [1] : https://elixir.bootlin.com/linux/v4.19.76/source/arch/x86/entry/vdso/vclock_gettime.c#L282
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.0-7787 contains kv_engine commit 7a0ce25 with commit message:
            MB-39618: Use coarse clock for CappedDurationVBucketVisitor

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7787 contains kv_engine commit 7a0ce25 with commit message: MB-39618 : Use coarse clock for CappedDurationVBucketVisitor
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-2347 contains kv_engine commit 7a0ce25 with commit message:
            MB-39618: Use coarse clock for CappedDurationVBucketVisitor

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2347 contains kv_engine commit 7a0ce25 with commit message: MB-39618 : Use coarse clock for CappedDurationVBucketVisitor
            Hide
            ritam.sharma Ritam Sharma added a comment -

            Sumedh Basarkod - can you please validate this defect.

            Show
            ritam.sharma Ritam Sharma added a comment - Sumedh Basarkod - can you please validate this defect.
            Hide
            sumedh.basarkod Sumedh Basarkod added a comment -

            Verified on 6.6.0-7854 build by spinning up the docker container and checking the idle CPU % usage. Closing the ticket.

            Show
            sumedh.basarkod Sumedh Basarkod added a comment - Verified on 6.6.0-7854 build by spinning up the docker container and checking the idle CPU % usage. Closing the ticket.

              People

              Assignee:
              sumedh.basarkod Sumedh Basarkod
              Reporter:
              daschl Michael Nitschinger
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty