Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51498

TLS connection handshake does not yield to event loop until complete

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Note: This issue has been spun out of MB-51077 / MB-26887 for tracking purposes - it has already been fixed in version 7.0.0 via the adoption of the libevent bufferevent API.

      Prior to 7.0.0, KV-Engine had a simplistic handler for the initial TLS handshake - if SSL_accept returns a temporary error (needs to read / write more data) then it simply drains both input / output pipes and retries without any yield:

      int Connection::sslAcceptWithRetry() {
          while (true) {
              int r = ssl.accept();
              if (r == 1) {
                  // handshake completed.
                  return r;
              }
       
              auto sslError = ssl.getError(r);
              if (sslError == SSL_ERROR_WANT_READ ||
                  sslError == SSL_ERROR_WANT_WRITE) {
                  // Drain send and receive pipes.
                  ssl.drainBioSendPipe(socketDescriptor);
                  if (ssl.hasError()) {
                      cb::net::set_econnreset();
                      return -1;
                  }
                  ssl.drainBioRecvPipe(socketDescriptor);
                  if (ssl.hasError()) {
                      cb::net::set_econnreset();
                      return -1;
                  }
                  // Continue SSL accept handshake.
                  continue;
              } else {
                  logSslErrorInfo("SSL_accept", r);
                  cb::net::set_econnreset();
                  return -1;
              }
          }
          folly::assume_unreachable();
      }
      

      Note how this loops calling ssl.accept() - which is just a thin wrapper around SSL_accept - exiting the loop when the handshake is successful. If SSL_accept instead returned SSL_ERROR_WANT_READ or SSL_ERROR_WANT_WRITE then we attempt to fulfull that request by draining the send and recv pipes (underlying TCP/IP send / recv buffers); on any other error code we give up.

      The issue is that after draining the send / recv pipes, the code immediately retries the loop. The problem here is that SSL_accept might still be waiting for more data to transfer over the network, and while we have pushed data down to the underlying TCP/IP socket, the expected response may not have arrived yet. In effect we have a non-blocking socket but we are using it in a blocking manner - by busy-waiting for data to be sent over.

      in theory it is possible the front-end thread could be blocked for an arbitrarily long period of time; as long as the underlying connection did not have data to read/write on it - and the TCP/IP connection was still established. In practice we have only observed the thread being blocked for the order of ~hundreds of milliseconds.

      This issue can cause other tasks in the engine to block for the duration of the SSL accept. Those (known) tasks are:
      1) DCP connection manager task
      2) DCP connection notifier task

      This code was added in 6.6.2 (see MB-42607) to handle cases where the complete TLS handshake was not completed in a single TCP/IP send/receive.

      This code was removed in 7.0.0 and upwards - we restructured the entire connection management to use libevent's bufferevent API to support out-of-order responses - see MB-26887.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty