Details
-
Improvement
-
Resolution: Done
-
Major
-
6.5.1, 6.6.0, 6.5.0
-
1
-
KV Sprint 2020-July, KV Sprint 2020-Oct, KV-Engine-Sept-21, KV 2021-Oct-21
Description
Investigation of MB-39618 highlighted that the current busy-polling implementation of SyncWrite timeout checking is very costly (approx 3.5% CPU per bucket) - a 10 bucket node (with zero op/s) consumes 35% CPU for memcached process:
top - 13:35:30 up 80 days, 22:37, 9 users, load average: 0.81, 0.64, 0.43
|
Tasks: 471 total, 1 running, 300 sleeping, 6 stopped, 2 zombie
|
%Cpu(s): 2.8 us, 0.9 sy, 0.0 ni, 96.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
|
KiB Mem : 13197644+total, 15423808 free, 3459340 used, 11309330+buff/cache
|
KiB Swap: 13416857+total, 13416651+free, 2060 used. 12749011+avail Mem
|
|
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|
1654 daver 20 0 1978268 260788 13232 S 35.1 0.2 600:52.51 memcached
|
The issue is that currently we schedule (and run) a DurabilityTimeoutTask per bucket every 25ms. When this runs it needs to iterate through all VBuckets, checking for any SyncWrites which have timed out. This is the case even if the cluster has zero SyncWrites in progress (!)
A more efficient solution would be a DurabilityTimeoutTask per VBucket, which is scheduled to run when the next SyncWrite in that vBucket will expire. If no SyncWrites are outstanding on that vBucket then no task would be scheduled (and nothing would need to wake up. This should reduce the idle CPU to close to zero (or at least not be a function of the number of Buckets).
However, such a scheme isn't feasible with our current Executor / scheduler implementation, as I don't believe it would scale to 10,000s of tasks (10 buckets would require up to 10240 tasks - and that's just for DurabilityTimeoutTask).
Facebook's Folly library does have an Executor which claims to scale to this level (using a hashed-hierarchal wheel timer). We should investigate if that is suitable or not.
Attachments
Issue Links
- depends on
-
MB-36956 Migrate to Facebook Folly executors for CPU & IO background tasks
- Closed
- relates to
-
MB-47920 Cost Effective Low-End Clusters improve resource utilization
- Open
-
MB-42346 CC: Very high CPU usage while under low load (memcached)
- Resolved
- split from
-
MB-39618 Memcached is CPU hungry when HPET clock source used
- Closed
For Gerrit Dashboard: MB-39815 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
130419,27 | MB-39815: Add event-driven SyncWrite timeout handling | master | kv_engine | Status: MERGED | +2 | +1 |
162085,13 | MB-39815: Add basic SyncWrite timeout test (ep_testsuite) | master | kv_engine | Status: MERGED | +2 | +1 |
162102,23 | MB-39815: Change durability_timeout_mode to event-driven | master | kv_engine | Status: MERGED | +2 | +1 |
163571,4 | MB-39815: Tighten argument checks in PDM::addSyncWrite | master | kv_engine | Status: MERGED | +2 | +1 |
163596,2 | MB-39815: Fix typos / missing @param documentation | master | kv_engine | Status: MERGED | +2 | +1 |
165826,2 | MB-39815: Adjust VBucketSyncWriteTimeoutTask expected duration | master | kv_engine | Status: MERGED | +2 | +1 |
179209,2 | Cleanup: remove 'polling' durability timeout mode | master | kv_engine | Status: MERGED | +2 | +1 |
198493,3 | MB-59022: Set engine correctly for VBucketSyncWriteTimeoutTask | master | kv_engine | Status: MERGED | +2 | +1 |