Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.0.1
-
None
-
None
-
1
-
SDK48: FTS Score/Incl, Docs., SDK51: FLEpt1, Txns Test Compl, SDK2: FLEpt2, TxnsTest, SDK4: 3.1GA,Ruby3.1,TxnTst,FLE
Description
Comparing SDK3 to SDK2 kv get/set latency, we see that across all latency percentiles, SDK3 is ~20-30% higher than SDK2. Here are the two tests for comparison:
SDK 2.5.0 - 24 workers:
http://perf.jenkins.couchbase.com/job/ares/17051/
SDK 3 - 24 workers:
http://perf.jenkins.couchbase.com/job/ares/17049/
In both tests, kv ops are against default scope/collection.
SDK3 is this patch: git+http://review.couchbase.org/couchbase-python-client@refs/changes/91/135991/3
Here is a comparison graph:
The only difference in these tests, other than sdk version, cur_connections to bucket and ESTABILSHED connections to master node 172.23.133.13. Perhaps these "extra" connections are causing the slow down as the "extra" connections is around 30% more than sdk2 which corresponds to the increase in latency
Workaround
Remove any externally installed libcouchbase releases from the system. When run with the built-in libcouchbase, the higher latency is not observed.
Attachments
For Gerrit Dashboard: PYCBC-1037 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
166865,6 | Stop installing SDKs if no version specified | master | perfrunner | Status: MERGED | +2 | +1 |
Raju Suravarjjala David Kelly I ran kv and n1ql tests with the sdk 3.0.8 and no lcb preinstalled. Here are the results with green being old runs and orange being new runs:
90th percentile query latency (ms), Q1, Key-Value Lookup, 10K queries/sec, default collection
Old: 1.2
New: 1.2
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_cd44&snapshot=iris_700-4226_access_0cab
90th percentile query latency (ms), Q1, Key-Value Lookup, 10K queries/sec, s=1 c=1
Old: 1.2
New: 1.2
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_534e&snapshot=iris_700-4226_access_38dd
90th percentile query latency (ms), Q1, Key-Value Lookup, 10K queries/sec, s=1 c=1000
Old: 1.2
New: 1.2
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_cf7e&snapshot=iris_700-4226_access_90e9
Avg. Query Throughput (queries/sec), Q1, Key-Value Lookup, s=1 c=1
Old: 127,774
New: 129,814
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_a7b4&snapshot=iris_700-4226_access_a29f
Avg. Query Throughput (queries/sec), Q1, Key-Value Lookup, default collection
Old: 127,943
New: 130,978
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-4226_access_8313&snapshot=iris_700-4226_access_16f4
99.9th percentile GET/SET latency (ms), 4 nodes, 1 bucket x 20M x 1KB, 10K ops/sec, default scope/collection
Old: Get - 0.85, Set - 0.98
New: Get - 1.0, Set - 1.12
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-4226_access_f0db&snapshot=ares_700-4226_access_e4da
99.9th percentile GET/SET latency (ms), 4 nodes, 1 bucket x 20M x 1KB, 10K ops/sec, s=1 c=1
Old: Get - 0.96, Set - 1.18
New: Get - 0.91, Set - 1.12
http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_700-4226_access_2b50&snapshot=ares_700-4226_access_d04e
There are some things to note:
1 - n1ql latency remains about the same as before. Default, 1 collection and 1000 collections all have same latency. The main difference in the n1ql test is the number of connection in the timewait state. This is significantly less with the new runs. N1ql also sees a mild bump in throughput. The ops and cpu utlization are also smoother and more consistent throughout the tests.
2 - KV default collection latency seems to have increase, both get and set. KV 1 collection seems to have decreased latency, both get and set. We do not see any difference in the number of connection in timewait state in the kv tests.
Comparing SDK2 to SDK3 default collection:
There still is a difference (30%) between sdk2 and sdk3 default collection for latency tests. However the n1ql throughput numbers for sdk2 and sdk3 are now roughly the same. This could possible be the effect of the new libevent IO in server.