Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51329

~20-50% throughput drop and OOM in YCSB uniform distribution tests

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Yes
    • KV March-22

    Description

      With 7.1.0-2434 we are seeing a throughput decrease of ~20% in a number of YCSB tests:

      1. http://showfast.sc.couchbase.com/#/timeline/Linux/kv/ycsb/ycsb3#ycsb_workloadca_3nodes_cpu_uni_1s_1c_hercules_kv

      2. http://showfast.sc.couchbase.com/#/timeline/Linux/kv/ycsb/ycsb3#ycsb_workloadca_3nodes_cpu_uni_ooo_1s_1000c_hercules_kv

      3. http://showfast.sc.couchbase.com/#/timeline/Linux/kv/ycsb/ycsb3#ycsb_workloada_3nodes_cpu_uni_ooo_hercules_kv

      4. http://showfast.sc.couchbase.com/#/timeline/Linux/kv/ycsb/ycsb3#ycsb_workloadca_3nodes_cpu_uni_1s_1000c_hercules_kv

       

      In some other tests we are seeing even greater throughput drops of ~40-50%:

      5. http://showfast.sc.couchbase.com/#/timeline/Linux/kv/durability/Throughput#ycsb_workloadck_4node_thr_durability3a_new_uni_ooo_1s_1000c_hercules_kv

      6. http://showfast.sc.couchbase.com/#/timeline/Linux/kv/durability/Throughput#ycsb_workloadk_4node_thr_durability3a_new_uni_hercules_kv

      7. http://showfast.sc.couchbase.com/#/timeline/Linux/kv/durability/Throughput#ycsb_workloadk_4node_thr_durability3a_new_uni_ooo_hercules_kv 

       

      The issue seems to be a product issue rather than an environment issue, as evidenced by the results of reruns of test 1) from above (as well as other reruns, but just showing one for brevity):

      7.1.0-2383: http://showfast.sc.couchbase.com/#/runs/ycsb_workloadca_3nodes_cpu_uni_1s_1c_hercules_kv/7.1.0-2383

      7.1.0-2434: http://showfast.sc.couchbase.com/#/runs/ycsb_workloadca_3nodes_cpu_uni_1s_1c_hercules_kv/7.1.0-2434

       

      [ Edit: potential red herring

      In the cbmonitor graphs for the regressed tests, we see lots of temp_oom errors, e.g:

      http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hercules_710-2434_access_56b2#629579b6bf3a05996bd8689da648aa19

      ]

      Here are pre- and post-regression runs for test 1) with associated graphs and logs:

      Run 1 (7.1.0-2383)
      Jenkins build: http://perf.jenkins.couchbase.com/job/hercules/15923/
      cbmonitor: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hercules_710-2383_access_db91
      cbcollects:
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15923/172.23.100.121.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15923/172.23.100.122.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15923/172.23.100.123.zip

      Run 2 (7.1.0-2434)
      Jenkins build: http://perf.jenkins.couchbase.com/job/hercules/15919/
      cbmonitor: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hercules_710-2434_access_56b2
      cbcollects:
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15919/172.23.100.121.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15919/172.23.100.122.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15919/172.23.100.123.zip

       
      Also, pre- and post-regression runs for test 7):

      Run 1 (7.1.0-2383)
      Jenkins link: http://perf.jenkins.couchbase.com/job/hercules/15966/ 
      cbmonitor: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hercules_710-2383_access_34f8
      cbcollects:
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15966/172.23.100.121.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15966/172.23.100.122.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15966/172.23.100.123.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15966/172.23.100.124.zip

      Run 2 (7.1.0-2434)
      Jenkins link: http://perf.jenkins.couchbase.com/job/hercules/15962/ 
      cbmonitor: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hercules_710-2434_access_ad8a
      cbcollects:
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15962/172.23.100.121.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15962/172.23.100.122.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15962/172.23.100.123.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hercules-15962/172.23.100.124.zip

       

      Binary search shows that build 2396 is the culprit, with the following commit likely being responsible:

      Commit: 6bd1c26df9b5f952bf16b4726dbf6def17b262bf in build: couchbase-server-7.1.0-2396
      MB-49469: Introduce max_checkpoints_hard_limit_multiplier

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Daniel.nagy Daniel Nagy
              Daniel.nagy Daniel Nagy
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty