There is a bug in the getRandomKey functionality where it is possible that we get stuck and never return a response to the consumer.
The bug has existed since 2013, however with the introduction of the query workbench is appearing more often, as it uses the getRandomKey functionality.
The bug only appears where we have replica vbuckets and / or the all active vbuckets are empty.
Furthermore the starting vbucket to search for a random key must be zero.
Main user-visible symptom of this is one (or more) memcached front-end threads spinning at 100% CPU utilisation.
The actual bug is KVBucket::getRandomKey
The bug is that we check for curr == start before checking if curr == max.
The issues shows up when we have an active but empty vbucket and start == 0.
Focusing the inner while loop
Recalling that the bucket is empty, the first time we enter the while loop
curr == 1. curr != start and curr != max and so we will call
getVBucket and increment curr to 2. We repeat until curr == 1024.
On going round the while loop again curr != start, but curr == max
and so set curr = 0.
We then call getVbucket and increment curr to 1. On going round the
while loop again curr != start (as it is 1), and hence we loop