Uploaded image for project: 'Couchbase .NET client library'
  1. Couchbase .NET client library
  2. NCBC-2526

requests wait forever while cluster is unreachable

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.1
    • 3.0.2
    • library
    • None
    • 1

    Description

      During a network failure, connections are cleaned up and al requests sits in the send queue.

      The time a request sits in the send queue is not counted against the timeout, so a request from client code e.g. await collection.GetAsync(...) will wait forever while the cluster is unreachable. 

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          jmorris Jeff Morris added a comment -

          Tommy Jakobsen -

          Isn't the token timeout respected?

          jmorris Jeff Morris added a comment - Tommy Jakobsen - Isn't the token timeout respected?
          tommyja Tommy Jakobsen added a comment - - edited

          As far as I can tell, the timeout has no effect as long as the request sits in the send queue. If there are no connections consuming requests from the queue nothing looks at the request to determine if it has timed out.

          In addition, if the cancellationtoken has timed out, the request is not re-queued. Its just dropped and the operationRequest.CompletionTask is never completed, even when the connection comes back up. Some code needs to be added to ensure requests are completed in this case.

          Also, it looks like there has been an attempt to fix the reconnection issue by calling CleanupDeadConnectionsAsync() from DataFlowConnectionPool.SendAsync 
          I suspect it is returning the wrong task in this case. I am still trying to understand exactly what happens in this case.

          tommyja Tommy Jakobsen added a comment - - edited As far as I can tell, the timeout has no effect as long as the request sits in the send queue. If there are no connections consuming requests from the queue nothing looks at the request to determine if it has timed out. In addition, if the cancellationtoken has timed out, the request is not re-queued. Its just dropped and the operationRequest.CompletionTask is never completed, even when the connection comes back up. Some code needs to be added to ensure requests are completed in this case. Also, it looks like there has been an attempt to fix the reconnection issue by calling CleanupDeadConnectionsAsync() from DataFlowConnectionPool.SendAsync  I suspect it is returning the wrong task in this case. I am still trying to understand exactly what happens in this case.
          tommyja Tommy Jakobsen added a comment - - edited

          After som more digging

          ClusterNode.ExecuteOp will attempt to queue a request by calling 

          await sender(op, state, token)

          It then immediately awaits the op.Completed task which is only completed by op.SendAsync which is supposed to be called from request.SendAsync which will not be called if request has been dropped from the queue due to timeout.

          the result is that await op.Completed will wait forever even after timeout and connection reestablished.

          tommyja Tommy Jakobsen added a comment - - edited After som more digging ClusterNode.ExecuteOp will attempt to queue a request by calling  await sender(op, state, token) It then immediately awaits the op.Completed task which is only completed by op.SendAsync which is supposed to be called from request.SendAsync which will not be called if request has been dropped from the queue due to timeout. the result is that await op.Completed will wait forever even after timeout and connection reestablished.

          People

            jmorris Jeff Morris
            tommyja Tommy Jakobsen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty