Uploaded image for project: 'Java Couchbase JVM Core'
  1. Java Couchbase JVM Core
  2. JVMCBC-534

PooledService creates excessive endpoints on sending to downed node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 1.6.0, 1.5.9
    • 1.4.2, 1.5.5, 1.5.6, 1.5.7, 1.5.8
    • Core
    • None
    • 1

    Description

      I found that when QueryService tries to send a query to a shutdown server, a huge number of endpoints are created, saturating the system resources.

      In detail, here's what happens:

      The N1QL query enters into QueryService::send.

      QueryService::send checks its 'endpoints'.size() + pendingRequests, both of which are 0, so it decides to open an endpoint and calls maybeOpenAndSend.

      maybeOpenAndSend creates an endpoint, but doesn't add it to 'endpoints'. It's only going to do this if the endpoint successfully connects.   It also ++pendingRequests - this is what keeps track of endpoints that are not yet connected.

      The created endpoint then times out after 32 msecs, as the node is down.  In AbstractEndpoint::doConnect it logs "Could not connect to remote socket", sets the state to disconnected, and calls the observerable's onError.  It then goes on to try again with an exponential backup. So each individual endpoint works as expected.

      The trouble is, the endpoint's onError is setup to call QueryService::unsubscribeAndRetry.  This decrements the pendingRequest back to 0.  So now we have a problem where QueryService thinks it has no endpoints in progress - but in fact the endpoint still exists and is still trying to make the request.

      The N1QL query comes into QueryService again and we go through the loop once more.  So we end up spawning many endpoints and not tracking them in either 'endpoints' or pendingRequest.

      Eventually another 300 second timeout fires, which stops endpoints firing indefinitely and cleans up the existing endpoints.

      From code inspection, it appears 1.4.2 through 1.5.8 (current as of this writing) are affected.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            daschl Michael Nitschinger
            graham.pople Graham Pople
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty