Details
Description
Upfront: this may or may not be harder to track down exactly, we'll see.
The gist is that during SDK development and SDKQE (especially), we observed very high auth latencies (sometimes even in the seconds), especially under load and when the node gets readded (but not exclusively).
There is a ticket opened for this on the SDKQE Side (http://www.couchbase.com/issues/browse/SDKQE-485), but don't care right now that it is marked as resolved. We implemented some better logistics int he SDK to mitigate the effect, but it is still there (could be on the server side).
You can find some logging in the zip files there, but I won't bother you with the client finest logging. Because we were seeing the issue, I added timings logging to
- the sasl mechs list
- each auth step
- the total time for auth
To me it looks like that it got worse with CRAM-MD5, not only because we now have 3 roundtrips instead of one with PLAIN. Trond mentioned something potentially wrong wrt to entropy?
Check this timings for example:
981598:[737.68 INFO] (SDKD log:137) WARNING: SASL List Mechanisms took 2791ms on
{QA sa=10.3.4.15/10.3.4.15:11210, #Rops=1, #Wops=0, #iq=0, topRop=SASL mechs operation, topWop=null, toWrite=0, interested=1}
985199:[740.73 INFO] (SDKD log:137) WARNING: SASL Step took 3042ms on
985455:[740.83 INFO] (SDKD log:137) WARNING: SASL Step took 103ms on
985457:[740.83 INFO] (SDKD log:137) WARNING: SASL Auth took 5938ms on
You can see the individual step timings and also the total auth time.
Deepti will provide tcpdump information for a run like this so we can dig into it further.
Let me/us know what you need from the SDK side to dig into that. Note that even locally, I observed timings in the 100dreds of milliseconds, which according to trond should never take that long.