Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39815

Avoid busy-polling for SyncWrite timeout checking

    XMLWordPrintable

Details

    • 1
    • KV Sprint 2020-July, KV Sprint 2020-Oct, KV-Engine-Sept-21, KV 2021-Oct-21

    Description

      Investigation of MB-39618 highlighted that the current busy-polling implementation of SyncWrite timeout checking is very costly (approx 3.5% CPU per bucket) - a 10 bucket node (with zero op/s) consumes 35% CPU for memcached process:

      top - 13:35:30 up 80 days, 22:37,  9 users,  load average: 0.81, 0.64, 0.43
      Tasks: 471 total,   1 running, 300 sleeping,   6 stopped,   2 zombie
      %Cpu(s):  2.8 us,  0.9 sy,  0.0 ni, 96.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
      KiB Mem : 13197644+total, 15423808 free,  3459340 used, 11309330+buff/cache
      KiB Swap: 13416857+total, 13416651+free,     2060 used. 12749011+avail Mem 
       
        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
       1654 daver     20   0 1978268 260788  13232 S  35.1  0.2 600:52.51 memcached
      

      The issue is that currently we schedule (and run) a DurabilityTimeoutTask per bucket every 25ms. When this runs it needs to iterate through all VBuckets, checking for any SyncWrites which have timed out. This is the case even if the cluster has zero SyncWrites in progress (!)

      A more efficient solution would be a DurabilityTimeoutTask per VBucket, which is scheduled to run when the next SyncWrite in that vBucket will expire. If no SyncWrites are outstanding on that vBucket then no task would be scheduled (and nothing would need to wake up. This should reduce the idle CPU to close to zero (or at least not be a function of the number of Buckets).

      However, such a scheme isn't feasible with our current Executor / scheduler implementation, as I don't believe it would scale to 10,000s of tasks (10 buckets would require up to 10240 tasks - and that's just for DurabilityTimeoutTask).

      Facebook's Folly library does have an Executor which claims to scale to this level (using a hashed-hierarchal wheel timer). We should investigate if that is suitable or not.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-39815
          # Subject Branch Project Status CR V

          Activity

            People

              drigby Dave Rigby (Inactive)
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty