Details
Description
Summary
When a new TLS connection is established in KV-Engine, non-negligible impact on service time (of the order of 1 millisecond) has been observed for client connections which share the same front-end thread as the incoming TLS connection, due to us making blocking calls into OpenSSL on the front-end thread.
We should investigate the overhead added, and if the blocking OpenSSL calls can be moved off the frontend worker thread(s).
Details
Establishing a new TLS connection requires making one or more blocking cal(s into OpenSSL ( SSL_accept) to perform the server-side of the handshake. This handshake is done on one of the front-end worker threads, after the TCP/IP connection has been {{accept}}ed by the singular listener thread.
Recall that all incoming connections are distributed round-robin to the fixed pool of front-end worker threads, hence in any non-trivial deployment multiple mcbp connections will share the same worker thread. In general we attempt to minimise any latency impact on other connections assigned to the same worker thread, by using non-blocking network IO, and only performing non-blocking (or very short-running) calls into the underlying engine - for example if a GET request is made for a non-resident document, the engine_get call schedules a background thread to perform the Disk fetch and returns ewouldblock status back to the thread runloop, taking that blocked thread out of the set of registered threads and moving on to the next connection.
However, the individual steps of SSL handshake (calls to SSL_accept) have been observed to take a non-trivial amount of time - ~1ms worst case on local cluster_run using libcouchbase for a connection without a client cert. As this runs on the front-end worker thread, any other connections assigned to it will have their request processing delayed by the same amount.
Note that the problem was initially observed on version 6.6.2, where KV-Engine directly handles the TLS handshake using SSL_accept - see Connection::sslAcceptWithRetry(). However in Neo we have delegated that call to libevent (via bufferevent_openssl_socket_new) where it uses a slightly different function (which performs similar work) - SSL_do_handshake.
Attachments
Issue Links
- relates to
-
MB-51498 TLS connection handshake does not yield to event loop until complete
- Closed