Details
-
Bug
-
Resolution: Won't Do
-
Major
-
2.6.2, 2.5.9
-
None
-
hebe / YCSB
-
1
Description
This is old issue from our "backlog".
One of our weekly tests is YCSB-based N1QL JOIN query (https://blog.couchbase.com/ycsb-json-benchmarking-json-databases-by-extending-ycsb/ , Report query)
The query is quite slow (avg latency 200+ ms) and returns large dataset.
A drop in throughout happened when switching from SDK 2.5.5 to 2.6.2. Also reproduced on 2.7.0. So my guess the root cause if tracing codepath.
See SL1 on showfast: http://showfast.sc.couchbase.com/#/timeline/Linux/n1ql/soe/all
I took 5.5.2-3733 build and ran few experiments:
with SDK 2.5.2 I've got 2K q/sec
SDK 2.6.2 ==> 1.4K q/sec
SDK 2.7.0 ==> 1.4K q/sec
I also tried to "disable" query tracing by setting high threshold and low sampling rate. Got the same 1.4K q/sec
Client CPU and memory utilization seems the same for 2.5.2 and 2.7.0 tests. The client machines aren't overloaded in neither of those tests. But YCSB reports about 30% higher latency with 2.7.0
At the same time server is less loaded during 2.7.0 run (lower CPU utilization, lower query time)
I also tried to push clients more, adding more client thread. That bring the max throughput to expected 2K with higher CPU utilization on the client.
So its fair to say that tracing overhead for this N1QL query is about 30%.
Why this particular query is slow while all the rest are fine I don't know. My guess its about results set size. But I didn't have time to validate that yet.
Server-side comparison:
YCSB artifacts:
2.7.0: http://perf.jenkins.couchbase.com/view/Weekly/job/hebe/2523
2.5.2: http://perf.jenkins.couchbase.com/view/Weekly/job/hebe/2524