Details
-
Improvement
-
Resolution: Fixed
-
Major
-
2.0.0
-
Security Level: Public
-
None
-
CBG Sprint 64, CBG Sprint 65, CBG Sprint 66
-
5
Description
(copied from https://github.com/couchbase/sync_gateway/issues/1294)
While doing performance analysis for the distributed index, compareHashAndPassword was high on the list until user ramp-up completed. This is somewhat working as intended - the bcrypt call is intended to be slow/expensive, to make online attacks expensive.
To address this, we maintain a cache of recent password hashes + SHA1 digest (10000 entries). As users connect and authenticate for the first time, they will gradually be filling up this queue.
However, when this queue fills up, we're dropping it completely and starting again from an empty cache. At that point all active users would trigger the bcrypt call on their next basic auth API call. Under high load, we'd expect a large CPU spike at that point.
If there are more than 10000 concurrent users, we'd be repeatedly dropping and recreating the cache.
At minimum, users anticipating high load should be using session-based auth, not basic auth.
Would like to review whether there's a more efficient way to manage this cache (or if I'm missing a subtlety in the implementation). We might not want the overhead of a full LRU cache, but should find a way to avoid CPU spikes when it fills up.