Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-10864

High auth latency observed from the SDKs (magnifying under load)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 3.0
    • 2.2.0, 2.5.1, 2.5.0
    • clients
    • Security Level: Public
    • None
    • Happens on SDKQE environments (Linux, virtual), but also seeing locally (mac os x + virtualbox linux) slightly higher times than expected.
    • Untriaged
    • Ubuntu 64-bit
    • Unknown

    Description

      Upfront: this may or may not be harder to track down exactly, we'll see.

      The gist is that during SDK development and SDKQE (especially), we observed very high auth latencies (sometimes even in the seconds), especially under load and when the node gets readded (but not exclusively).

      There is a ticket opened for this on the SDKQE Side (http://www.couchbase.com/issues/browse/SDKQE-485), but don't care right now that it is marked as resolved. We implemented some better logistics int he SDK to mitigate the effect, but it is still there (could be on the server side).

      You can find some logging in the zip files there, but I won't bother you with the client finest logging. Because we were seeing the issue, I added timings logging to

      • the sasl mechs list
      • each auth step
      • the total time for auth

      To me it looks like that it got worse with CRAM-MD5, not only because we now have 3 roundtrips instead of one with PLAIN. Trond mentioned something potentially wrong wrt to entropy?

      Check this timings for example:

      981598:[737.68 INFO] (SDKD log:137) WARNING: SASL List Mechanisms took 2791ms on

      {QA sa=10.3.4.15/10.3.4.15:11210, #Rops=1, #Wops=0, #iq=0, topRop=SASL mechs operation, topWop=null, toWrite=0, interested=1}


      985199:[740.73 INFO] (SDKD log:137) WARNING: SASL Step took 3042ms on

      {QA sa=10.3.4.15/10.3.4.15:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0}


      985455:[740.83 INFO] (SDKD log:137) WARNING: SASL Step took 103ms on

      {QA sa=10.3.4.15/10.3.4.15:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0}


      985457:[740.83 INFO] (SDKD log:137) WARNING: SASL Auth took 5938ms on

      {QA sa=10.3.4.15/10.3.4.15:11210, #Rops=0, #Wops=0, #iq=0, topRop=null, topWop=null, toWrite=0, interested=0}

      You can see the individual step timings and also the total auth time.

      Deepti will provide tcpdump information for a run like this so we can dig into it further.
      Let me/us know what you need from the SDK side to dig into that. Note that even locally, I observed timings in the 100dreds of milliseconds, which according to trond should never take that long.

      Attachments

        1. cbcollectinfo_mb10684_11.zip
          3.03 MB
        2. cbcollectinfo_mb10684_15.zip
          3.06 MB
        3. cbcollectinfo_mb10684_245.zip
          5.03 MB
        4. cbcollectinfo_mb10684.zip
          4.57 MB
        5. mb10864_server11.zip
          27.92 MB
        6. mb10864_server15.zip
          28.08 MB
        7. mb10864_server245.zip
          28.10 MB
        8. mb10864_tcpdump.zip
          45.91 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            daschl Michael Nitschinger
            daschl Michael Nitschinger
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty