Details
-
Improvement
-
Resolution: Fixed
-
Major
-
2.1.2
-
Security Level: Public
-
None
Description
- Picture a scenario where there's a very large list of sequences pending, possibly through tweaking max_*_pending.
- Eventually, the magic doc we've all been waiting for comes along, and everything files neatly out of the pending list and into the cache.
- That process seems to get slower and slower, and we're spending most of our time in a call to base.Set.Union(), and within that, Set.Copy().
- Trouble is, as we iterate over everything in the pending list (which we expect to be reasonably sized) we union the channels with everything else we've seen on this iteration of _addPendingLogs(). If we're "unpending" millions of docs with millions of channels, those union calls are going to get really costly...
Now, obviously, as mentioned, we don't really expect the list to be that big on each iteration of this, but I suppose it could be similarly possible if you had a more modest number of docs with hundreds/thousands of unique channels each (in my case, it's 3 channels per doc, 1 unique).
It looks like (credit to Adam Fraser for this bit!) the behaviour of Set.Union() here is much more than what we actually need - for this use case we don't need to be doing the Copy() within it.
Rather than refactor Set itself, for this particular use in _addPendingLogs() we might be able to simply use a map and then wrap that in a Set to return (or potentially some Golang magic that's beyond me!).