Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34300

cbbackupmgr merge throughput degradation -%60

    XMLWordPrintable

Details

    • Triaged
    • Yes

    Description

      We are observing a decrease in enterprise cbbackupmgr merge throughput (MB/sec) by 60%

      Test

      EE merge throughput (Avg. MB/sec). 4 nodes. 100M x 100M docs (overlapping keys)

      Results

      6.5.0-3197: 122 MB/sec

      6.5.0-3198: 54 MB/sec

      Report

      http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_650-3197_merge_bc25&snapshot=leto_650-3198_merge_d3f0

      Logs for 6.5.0-3197

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9901/leto-srv-01.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9901/leto-srv-02.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9901/leto-srv-03.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9901/leto-srv-04.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9901/tools.zip

      Logs for 6.5.0-3198**

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9900/leto-srv-01.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9900/leto-srv-02.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9900/leto-srv-03.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9900/leto-srv-04.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9900/tools.zip

      Changelog

      http://172.23.123.43:8000/getchangelog?product=couchbase-server&fromb=6.5.0-3197&tob=6.5.0-3198

      Comment

      It looks like the behavior of backup and merge have changed significantly in this commit:

      https://github.com/couchbase/backup/commit/3d59d52be05a411f437c375bd7235638666bd2d4

      Looking at the report graphs, we see that all 4 nodes in the cluster are utilizing their disks during merge in 3198 whereas the previous merge (pre-3198)only utilized disk resources on a single node. With such a significant change, perhaps we need to review the test to align it with the new merge behavior. Otherwise, it seems like the merge operation is now routing data to the appropriate vbucket directly instead of just a single node.

      Could you let us know what the new intended merge behavior is?

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          This is expected as the test have changed, previous test was flawed. I will let Toby Wilds explain the details.

          pvarley Patrick Varley added a comment - This is expected as the test have changed, previous test was flawed. I will let Toby Wilds explain the details.
          toby.wilds Toby Wilds added a comment -

          Merge now accepts a merge flag, meaning we no longer automatically determine the number of threads based on on the number of shards.

          I believe, because of the number of threads we've historically run backup with, these backups had 16 shards, and so merge ran with 16 threads. The default is now 1 because of this change, which is why the test throughput dropped so dramatically.

          Revised test is up at: http://review.couchbase.org/#/c/109205/

          Just a few other changes that need to happen first before we merge and re-run them. 

           

           

          toby.wilds Toby Wilds added a comment - Merge now accepts a merge flag, meaning we no longer automatically determine the number of threads based on on the number of shards. I believe, because of the number of threads we've historically run backup with, these backups had 16 shards, and so merge ran with 16 threads. The default is now 1 because of this change, which is why the test throughput dropped so dramatically. Revised test is up at:  http://review.couchbase.org/#/c/109205/ Just a few other changes that need to happen first before we merge and re-run them.     
          toby.wilds Toby Wilds added a comment -

          Closing this as the test in question was flawed and improvements to the tests are underway. I'll re-raise for any degradations these new tests uncover as needed.

          toby.wilds Toby Wilds added a comment - Closing this as the test in question was flawed and improvements to the tests are underway. I'll re-raise for any degradations these new tests uncover as needed.

          People

            toby.wilds Toby Wilds
            korrigan.clark Korrigan Clark (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty