Description
We've hit issues before where, due to issues with ns_server/LDAP, memcached would end up with a large number of pending external authentications.
External authentication is managed by ExternalAuthManagerThread. ns_server will establish a connection to memcached and offer to be an external auth provider. Then, any authentications which cannot be done locally will be proxied to ns_server, and ns_server will speak to LDAP.
The pending responses are in ExternalAuthManagerThread::requestMap. We should expose Prometheus metrics with the number of entries added and removed from that map, to be able to more easily detect when we're waiting on ns_server.
Attachments
For Gerrit Dashboard: MB-60497 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
204976,13 | MB-60497: Expose number of external auth requests | master | kv_engine | Status: MERGED | +2 | +1 |
205030,16 | MB-60497: Log warning when external auth is longer than 5s | master | kv_engine | Status: MERGED | +2 | +1 |
205232,5 | MB-60497: Add external_auth_slow_duration config parameter | master | kv_engine | Status: MERGED | +2 | +1 |
205468,3 | MB-60497: Add external_auth_delayed_response config parameter | master | kv_engine | Status: ABANDONED | 0 | -1 |
205498,4 | MB-60497: Add external_auth_response_timeout config parameter | master | kv_engine | Status: MERGED | +2 | +1 |
205584,10 | MB-60497: Timeout external auth request with no response | master | kv_engine | Status: MERGED | +2 | +1 |
205871,12 | MB-60497: Time external auth request into a histogram | master | kv_engine | Status: MERGED | +2 | +1 |