Description
I found that when QueryService tries to send a query to a shutdown server, a huge number of endpoints are created, saturating the system resources.
In detail, here's what happens:
The N1QL query enters into QueryService::send.
QueryService::send checks its 'endpoints'.size() + pendingRequests, both of which are 0, so it decides to open an endpoint and calls maybeOpenAndSend.
maybeOpenAndSend creates an endpoint, but doesn't add it to 'endpoints'. It's only going to do this if the endpoint successfully connects. It also ++pendingRequests - this is what keeps track of endpoints that are not yet connected.
The created endpoint then times out after 32 msecs, as the node is down. In AbstractEndpoint::doConnect it logs "Could not connect to remote socket", sets the state to disconnected, and calls the observerable's onError. It then goes on to try again with an exponential backup. So each individual endpoint works as expected.
The trouble is, the endpoint's onError is setup to call QueryService::unsubscribeAndRetry. This decrements the pendingRequest back to 0. So now we have a problem where QueryService thinks it has no endpoints in progress - but in fact the endpoint still exists and is still trying to make the request.
The N1QL query comes into QueryService again and we go through the loop once more. So we end up spawning many endpoints and not tracking them in either 'endpoints' or pendingRequest.
Eventually another 300 second timeout fires, which stops endpoints firing indefinitely and cleans up the existing endpoints.
From code inspection, it appears 1.4.2 through 1.5.8 (current as of this writing) are affected.