Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-54674

cbcollect_info reading /proc/<PID>/smaps causes 20+ second pauses in CB processes

    XMLWordPrintable

Details

    • Untriaged
    • Linux x86_64
    • 0
    • No

    Description

      Summary

      As observed in some Linux environments with Transparent HugePages disabled and large amounts of RAM / bucket quota, some of the diagnostic information gathered by cbcollect_info can result in multi-second pauses of each Couchbase process as the /proc/<PID>/smaps information is gathered - up to 20s in some instances.

      During this time the process is essentially stopped - requests cannot be serviced, resulting in them potentially timing out.

      This includes both requests from end-user applications, and internal requests such as the Query Service.

      Workaround

      (A) Avoid collecting logs during non-idle cluster times - the Kernel issue is triggered when logs are collected.

      (B) If log collection is necessary, perform it directly from the command-line and add an additional --task-regex argument to exclude the problematic files:

      /opt/couchbase/bin/cbcollect_info --task-regexp='^(?!Relevant proc data)' cbcollect.zip
      

      Details

      The Linux kernel exposes a number of files under /proc/PID to introspect the memory state of a interesting Couchbase Server processes. The cbcollect_info script as used for Couchbase log collection reads the contents of some of these files as part of normal diagnostic capture. As of 7.0.4, the captured files are:

      1. /proc/<PID>/status
      2. /proc/<PID>/limits
      3. /proc/<PID>/smaps
      4. /proc/<PID>/numa_maps

      These are captured for the following processes: moxi memcached beam.smp couch_compact godu sigar_port cbq-engine indexer projector goxdcr cbft eventing-producer eventing-consumer

      The files in /proc are not "real" files - they are typically generated on-demand by Linux when the user attempts to read them. Some of these files (smaps and numa_maps) can take a significant amount of the for the kernel to generate for processes which have a large number of entries in their pageable - i.e. large virtual address space. While the kernel is generating the file it can block userspace processes from being scheduled - particularly the process having it's memory state examined.

      In the case of memcached processes with a Data Service quota of 500GB+, we have observed pauses in excess of 20 seconds:

      Note how:

      • Normal NonIO / WriterPool constantly scheduled tasks running are disrupted for a ~20s period.
      • Virtually all frontend worker threads experience Slow operations and very long mutex held periods

      There's an excellent write-up on this phenomenon at https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/10966#note_410194443 which goes into Linux kernel specifics of what the problem is.

      Transparent Hugepages

      Note the issue appears to be significantly worse when Transparent Huge Pages is disabled - for example on a node with ~500GB bucket quota and THP set to "never" (as recommended for Couchbase production deployments) we observe pauses of ~20s. With THP set to "always" (out of the box default) no observable pause is seen.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            gilad.kalchheim Gilad Kalchheim
            drigby Dave Rigby (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            22 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty