The StreamsMap in the DcpProducer is a custom AtomicUnorderedMap implementation. This is basically a wrapper around an std::unordered_map that guards every operation with a cb::RWLock. This RWLock is a single bottleneck for every DCP item we send (we acquire the read lock in DcpProducer::getNextItem()), and for every front end operation that results in a new seqno (set/replace/delete etc. via DcpProducer::notifySeqnoAvailable(...)). This introduces a cache contention issue as the RWLock implementation will have to write to the readers field of the underlying read write lock to acquire it, and subsequently to release it.
This could be fixed in a couple of different ways; by creating a sparse map and using the StreamContainer lock which would bloat the size of the object, or by creating a sharded map with multiple rwlocks which would be non-trivial. Folly has a ConcurrentHashMap class that uses hazard pointers to ensure reads are completely lock free, and shards the map. The sharding should reduce cache contention on a producer significantly.
As folly would be useful in many other places in kv engine and there is no trivial solution to this performance issue, we should fix this by using folly's ConcurrentHashMap.
|For Gerrit Dashboard: MB-33157|
|106007,4||MB-33157: Use folly SharedLock in atomic unordered map||master||kv_engine||Status: ABANDONED||0||-1|
|106272,9||MB-33157: Use folly AtomicHashMap in DcpProducer||master||kv_engine||Status: ABANDONED||0||-1|
|106634,5||MB-33157: Use folly AtomicHashMap in DcpProducer||master||kv_engine||Status: ABANDONED||0||-1|
|107434,10||MB-33157: Use folly AtomicHashMap in DcpProducer||master||kv_engine||Status: MERGED||+2||+1|