Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-33157

DcpProducer::streams cache contention

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      The StreamsMap in the DcpProducer is a custom AtomicUnorderedMap implementation. This is basically a wrapper around an std::unordered_map that guards every operation with a cb::RWLock. This RWLock is a single bottleneck for every DCP item we send (we acquire the read lock in DcpProducer::getNextItem()), and for every front end operation that results in a new seqno (set/replace/delete etc. via DcpProducer::notifySeqnoAvailable(...)). This introduces a cache contention issue as the RWLock implementation will have to write to the readers field of the underlying read write lock to acquire it, and subsequently to release it.

      https://issues.couchbase.com/browse/MB-32107?focusedCommentId=316320&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-316320
      https://issues.couchbase.com/browse/MB-32107?focusedCommentId=317274&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-317274

      This could be fixed in a couple of different ways; by creating a sparse map and using the StreamContainer lock which would bloat the size of the object, or by creating a sharded map with multiple rwlocks which would be non-trivial. Folly has a ConcurrentHashMap class that uses hazard pointers to ensure reads are completely lock free, and shards the map. The sharding should reduce cache contention on a producer significantly.

      As folly would be useful in many other places in kv engine and there is no trivial solution to this performance issue, we should fix this by using folly's ConcurrentHashMap.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-33157
          # Subject Branch Project Status CR V

          Activity

            drigby Dave Rigby added a comment -

            As folly would be useful in many other places in kv engine and there is no trivial solution to this performance issue, we should fix this by using folly's ConcurrentHashMap.

            Also take a look at folly's AtomicHashMap which might be a better choice given we know the upper bound on the number of elements in the map (1024 - number of vBuckets).

            Another option would be a simple std::vector (fixed at size 1024) of https://github.com/facebook/folly/blob/master/folly/concurrency/AtomicSharedPtr.h - this has the downside of a fixed memory overhead (8KB) per DCP Producer, but overall simpler. Suggest you compare with the memory sizes of DCPProducer and ActiveStream at some typical expected number of streams per node.

            drigby Dave Rigby added a comment - As folly would be useful in many other places in kv engine and there is no trivial solution to this performance issue, we should fix this by using folly's ConcurrentHashMap. Also take a look at folly's AtomicHashMap which might be a better choice given we know the upper bound on the number of elements in the map (1024 - number of vBuckets). Another option would be a simple std::vector (fixed at size 1024) of https://github.com/facebook/folly/blob/master/folly/concurrency/AtomicSharedPtr.h - this has the downside of a fixed memory overhead (8KB) per DCP Producer, but overall simpler. Suggest you compare with the memory sizes of DCPProducer and ActiveStream at some typical expected number of streams per node.

            Build couchbase-server-6.5.0-2908 contains kv_engine commit 3d03132 with commit message:
            MB-33157: Use folly AtomicHashMap in DcpProducer

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-2908 contains kv_engine commit 3d03132 with commit message: MB-33157 : Use folly AtomicHashMap in DcpProducer

            We should average around 2,020,000 ops/s but we've got another regression since my previous perf runs.

            ben.huddleston Ben Huddleston added a comment - We should average around 2,020,000 ops/s but we've got another regression since my previous perf runs.

            People

              ben.huddleston Ben Huddleston
              ben.huddleston Ben Huddleston
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty