Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.5, 6.6.6, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.0, 7.1.1, 7.1.2, 7.1.3
-
Untriaged
-
Linux x86_64
-
0
-
No
Description
Summary
As observed in some Linux environments with Transparent HugePages disabled and large amounts of RAM / bucket quota, some of the diagnostic information gathered by cbcollect_info can result in multi-second pauses of each Couchbase process as the /proc/<PID>/smaps information is gathered - up to 20s in some instances.
During this time the process is essentially stopped - requests cannot be serviced, resulting in them potentially timing out.
This includes both requests from end-user applications, and internal requests such as the Query Service.
Workaround
(A) Avoid collecting logs during non-idle cluster times - the Kernel issue is triggered when logs are collected.
(B) If log collection is necessary, perform it directly from the command-line and add an additional --task-regex argument to exclude the problematic files:
/opt/couchbase/bin/cbcollect_info --task-regexp='^(?!Relevant proc data)' cbcollect.zip
|
Details
The Linux kernel exposes a number of files under /proc/PID to introspect the memory state of a interesting Couchbase Server processes. The cbcollect_info script as used for Couchbase log collection reads the contents of some of these files as part of normal diagnostic capture. As of 7.0.4, the captured files are:
- /proc/<PID>/status
- /proc/<PID>/limits
- /proc/<PID>/smaps
- /proc/<PID>/numa_maps
These are captured for the following processes: moxi memcached beam.smp couch_compact godu sigar_port cbq-engine indexer projector goxdcr cbft eventing-producer eventing-consumer
The files in /proc are not "real" files - they are typically generated on-demand by Linux when the user attempts to read them. Some of these files (smaps and numa_maps) can take a significant amount of the for the kernel to generate for processes which have a large number of entries in their pageable - i.e. large virtual address space. While the kernel is generating the file it can block userspace processes from being scheduled - particularly the process having it's memory state examined.
In the case of memcached processes with a Data Service quota of 500GB+, we have observed pauses in excess of 20 seconds:
Note how:
- Normal NonIO / WriterPool constantly scheduled tasks running are disrupted for a ~20s period.
- Virtually all frontend worker threads experience Slow operations and very long mutex held periods
There's an excellent write-up on this phenomenon at https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/10966#note_410194443 which goes into Linux kernel specifics of what the problem is.
Transparent Hugepages
Note the issue appears to be significantly worse when Transparent Huge Pages is disabled - for example on a node with ~500GB bucket quota and THP set to "never" (as recommended for Couchbase production deployments) we observe pauses of ~20s. With THP set to "always" (out of the box default) no observable pause is seen.