From the large cluster, we observed that there are lots of disconnected TAP producers with large disk backfill queue after several rebalance attempts failed. Those producers are incrementally removed from the memory by removing 1000 items from their backfill queue. However, this incremental clean up caused heavy memory usage, which resulted in massive item evictions in a very short period. To prevent this from happening, we should remove those disconnected tap producers as soon as possible by the non-IO dispatcher. As the non-IO dispatcher is not involved in sending notifications to the pending memcached connections, this adaptation won't affect the frontend perfomance significantly.
|For Gerrit Dashboard: &For+MB-5371=message:MB-5371|
|16358,1||MB-5371 Clean up disconnected TAP producers immediately.||ep-engine||Status: MERGED||+2||+1|
|16384,1||MB-5371 Clean up disconnected TAP producers immediately.||ep-engine||Status: MERGED||+2||+1|
|16387,1||Merge branch 'branch-18'||ep-engine||Status: MERGED||+2||+1|