Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51077

TLS connection establishment uses blocking SSL_accept call on front-end threads

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • 7.0.0
    • 6.6.0, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 6.6.5
    • memcached
    • None
    • 1

    Description

      Summary

      When a new TLS connection is established in KV-Engine, non-negligible impact on service time (of the order of 1 millisecond) has been observed for client connections which share the same front-end thread as the incoming TLS connection, due to us making blocking calls into OpenSSL on the front-end thread.

      We should investigate the overhead added, and if the blocking OpenSSL calls can be moved off the frontend worker thread(s).

      Details

      Establishing a new TLS connection requires making one or more blocking cal(s into OpenSSL ( SSL_accept) to perform the server-side of the handshake. This handshake is done on one of the front-end worker threads, after the TCP/IP connection has been {{accept}}ed by the singular listener thread.

      Recall that all incoming connections are distributed round-robin to the fixed pool of front-end worker threads, hence in any non-trivial deployment multiple mcbp connections will share the same worker thread. In general we attempt to minimise any latency impact on other connections assigned to the same worker thread, by using non-blocking network IO, and only performing non-blocking (or very short-running) calls into the underlying engine - for example if a GET request is made for a non-resident document, the engine_get call schedules a background thread to perform the Disk fetch and returns ewouldblock status back to the thread runloop, taking that blocked thread out of the set of registered threads and moving on to the next connection.

      However, the individual steps of SSL handshake (calls to SSL_accept) have been observed to take a non-trivial amount of time - ~1ms worst case on local cluster_run using libcouchbase for a connection without a client cert. As this runs on the front-end worker thread, any other connections assigned to it will have their request processing delayed by the same amount.

      Note that the problem was initially observed on version 6.6.2, where KV-Engine directly handles the TLS handshake using SSL_accept - see Connection::sslAcceptWithRetry(). However in Neo we have delegated that call to libevent (via bufferevent_openssl_socket_new) where it uses a slightly different function (which performs similar work) - SSL_do_handshake.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              trond Trond Norbye
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty