Memcached is CPU hungry when HPET clock source used

Description

This is using our official docker image which is built on top of ubuntu 16.04 LTS.

A user has reported this in https://github.com/testcontainers/testcontainers-java/issues/2802 and I've been able to reproduce this locally.

If you create 3-4 buckets, memcached cpu goes up to 100-150% inside the container. This happens with 6.5.0 and 6.5.1. I also tried with 6.0.3 and there the CPU is at 13%.

I triaged a bit with and the current suspicion is that the sync durability monitor is consuming way more cpu in this setup than tuned for non-virtual linux hosts (less than 1%).. a couple % are probably fine, but this is keeping 1-2 cores busy all the time, even if there is no ops/traffic going through.

If you are not able to reproduce this I can provide the java program to run it, but all testcontainers-java does is spin up the docker container, configure it with alternate addr to expose the ports and then create 3 buckets.


Some more info from the initial triage:

top output:

points to the NonIO threads:

mpstat shows we're spending some time in sys:

Affects versions

Fix versions

Labels

Environment

Inside Docker, macOS as a host, using HPET clock source for Docker VM.

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Attachments

1

Activity

Sumedh Basarkod July 7, 2020 at 6:11 AM

Verified on 6.6.0-7854 build by spinning up the docker container and checking the idle CPU % usage. Closing the ticket.

Ritam Sharma June 29, 2020 at 4:25 AM

- can you please validate this defect.

CB robot June 12, 2020 at 9:52 AM

Build couchbase-server-7.0.0-2347 contains kv_engine commit 7a0ce25 with commit message:
: Use coarse clock for CappedDurationVBucketVisitor

CB robot June 9, 2020 at 9:17 AM

Build couchbase-server-6.6.0-7787 contains kv_engine commit 7a0ce25 with commit message:
: Use coarse clock for CappedDurationVBucketVisitor

Dave Rigby June 5, 2020 at 3:35 PM
Edited

Fix up on gerrit: http://review.couchbase.org/c/kv_engine/+/129936. Ended up taking a slightly different (and simpler approach) - use CLOCK_MONOTONIC_COARSE instead of CLOCK_MONOTONIC for DurabilityTimeoutTask's checking when to pause. This clock only has 1ms resolution, but crucially can always be handled in the userspace VDSO for current Linux kernels[1]. Given the duration we are trying to measure is 25ms, a 1ms resolution clock is perfectly adequate.

Idle CPU on local test machine (mancouch) with 5 empty buckets is back to the same numbers seen with the tsc clock source:

  • clocksource=tsc

  • clocksource=hpet

  • clocksource=hpet, folly::chrono::coarse_steady_clock

[1]: https://elixir.bootlin.com/linux/v4.19.76/source/arch/x86/entry/vdso/vclock_gettime.c#L282

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created May 28, 2020 at 1:48 PM
Updated July 7, 2020 at 6:12 AM
Resolved June 9, 2020 at 8:39 AM
Instabug