This patch added an UMPMCQueue which is modified while tracking allocations/deallocations against a bucket. There are other existing uses of the queue type in "no-bucket" situations (e.g., inside the executorpool).
folly::UMPMCQueue uses folly hazard_pointers internally to protect Segments - an internal object used to store the queued items. Hazptr-protected objects which are no longer needed but may still be being accessed by other threads may be retire -ed, but destruction will be delayed until some later time when no hazard pointer references them.
retire transfers ownership of the object to a hazptr_domain; by default a single global domain will be used. Once the number of retired objects in a domain exceeds a certain threshold, the thread which pushed the count over the threshold will check all retired items and will reclaim any which are no longer referenced. They will be, by default, destroyed inline by that thread.
Removing items from a UMPMCQueue may allow Segments to be retired. This may trigger destruction of objects which have been retired into the same domain by any hazard pointer user, including other UMPMCQueues. Thus, memory which was allocated under "non-bucket" by one queue may be freed and accounted against a bucket while manipulating a different queue, leading to mem_used becoming lower than the true value.
UMPMCQueue does not currently support providing a custom domain (and internally uses a cohort, which also doesn't support this). If this is supported in the future, a hazptr_domain per bucket would be an ideal solution. This could be worked around now without folly changes as noted in this comment, but would not be a robust solution, and would likely break with future folly releases.
Making changes to avoid use of UMPMCQueue while tracking memory usage against a bucket would be an expedient solution.
1. Run 6.6.5 longevity test for 5-6 days.
2. Online upgrade to 7.1 using swap rebalance and graceful failover/recovery strategies.
3. Did bunch of rebalances post upgrade.
UI : http://172.23.106.134:8091/ui/index.html#/buckets?commonBucket=ORDERS&scenarioZoom=minute&scenario=d26rq56l9
Buckets Before upgrade :-
Buckets After upgrade :-
Wonder if this would affect our ejection criteria or if it's just an UI issue.
cbcollect_info attached. This is the first time we are running system test upgrade to 7.1.
|For Gerrit Dashboard: MB-50546|
|169818,4||Test adding per-bucket default hazptr||master||kv_engine||Status: NEW||0||-1|
|169844,2||MB-50546: Restore AtomicQueue to replace folly::UMPMCQueue||master||kv_engine||Status: ABANDONED||+1||-1|
|169845,2||MB-50546: Move ConnMap from folly::UMPMCQueue to AtomicQueue||master||kv_engine||Status: ABANDONED||0||-1|
|169918,2||MB-50546: Revert "MB-36996: Replace remaining uses of AtomicQueue with folly Queue classes"||master||kv_engine||Status: MERGED||+2||+1|
|170336,6||MB-50647: Remove AtomicQueue||master||kv_engine||Status: MERGED||+2||+1|