Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
7.1.0-2434
-
Untriaged
-
1
-
Yes
-
KV March-22
Description
With 7.1.0-2434 we are seeing a throughput decrease of ~20% in a number of YCSB tests:
In some other tests we are seeing even greater throughput drops of ~40-50%:
The issue seems to be a product issue rather than an environment issue, as evidenced by the results of reruns of test 1) from above (as well as other reruns, but just showing one for brevity):
7.1.0-2383: http://showfast.sc.couchbase.com/#/runs/ycsb_workloadca_3nodes_cpu_uni_1s_1c_hercules_kv/7.1.0-2383
7.1.0-2434: http://showfast.sc.couchbase.com/#/runs/ycsb_workloadca_3nodes_cpu_uni_1s_1c_hercules_kv/7.1.0-2434
[ Edit: potential red herring
In the cbmonitor graphs for the regressed tests, we see lots of temp_oom errors, e.g:
]
Here are pre- and post-regression runs for test 1) with associated graphs and logs:
Run 1 (7.1.0-2383)
Jenkins build: http://perf.jenkins.couchbase.com/job/hercules/15923/
cbmonitor: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hercules_710-2383_access_db91
cbcollects:
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15923/172.23.100.121.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15923/172.23.100.122.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15923/172.23.100.123.zip
Run 2 (7.1.0-2434)
Jenkins build: http://perf.jenkins.couchbase.com/job/hercules/15919/
cbmonitor: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hercules_710-2434_access_56b2
cbcollects:
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15919/172.23.100.121.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15919/172.23.100.122.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15919/172.23.100.123.zip
Also, pre- and post-regression runs for test 7):
Run 1 (7.1.0-2383)
Jenkins link: http://perf.jenkins.couchbase.com/job/hercules/15966/
cbmonitor: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hercules_710-2383_access_34f8
cbcollects:
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15966/172.23.100.121.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15966/172.23.100.122.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15966/172.23.100.123.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15966/172.23.100.124.zip
Run 2 (7.1.0-2434)
Jenkins link: http://perf.jenkins.couchbase.com/job/hercules/15962/
cbmonitor: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hercules_710-2434_access_ad8a
cbcollects:
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15962/172.23.100.121.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15962/172.23.100.122.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15962/172.23.100.123.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15962/172.23.100.124.zip
Binary search shows that build 2396 is the culprit, with the following commit likely being responsible:
Commit: 6bd1c26df9b5f952bf16b4726dbf6def17b262bf in build: couchbase-server-7.1.0-2396
MB-49469: Introduce max_checkpoints_hard_limit_multiplier