Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34300

cbbackupmgr merge throughput degradation -%60

    XMLWordPrintable

Details

    • Triaged
    • Yes

    Description

      We are observing a decrease in enterprise cbbackupmgr merge throughput (MB/sec) by 60%

      Test

      EE merge throughput (Avg. MB/sec). 4 nodes. 100M x 100M docs (overlapping keys)

      Results

      6.5.0-3197: 122 MB/sec

      6.5.0-3198: 54 MB/sec

      Report

      http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_650-3197_merge_bc25&snapshot=leto_650-3198_merge_d3f0

      Logs for 6.5.0-3197

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9901/leto-srv-01.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9901/leto-srv-02.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9901/leto-srv-03.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9901/leto-srv-04.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9901/tools.zip

      Logs for 6.5.0-3198**

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9900/leto-srv-01.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9900/leto-srv-02.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9900/leto-srv-03.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9900/leto-srv-04.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-9900/tools.zip

      Changelog

      http://172.23.123.43:8000/getchangelog?product=couchbase-server&fromb=6.5.0-3197&tob=6.5.0-3198

      Comment

      It looks like the behavior of backup and merge have changed significantly in this commit:

      https://github.com/couchbase/backup/commit/3d59d52be05a411f437c375bd7235638666bd2d4

      Looking at the report graphs, we see that all 4 nodes in the cluster are utilizing their disks during merge in 3198 whereas the previous merge (pre-3198)only utilized disk resources on a single node. With such a significant change, perhaps we need to review the test to align it with the new merge behavior. Otherwise, it seems like the merge operation is now routing data to the appropriate vbucket directly instead of just a single node.

      Could you let us know what the new intended merge behavior is?

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            toby.wilds Toby Wilds (Inactive)
            korrigan.clark Korrigan Clark (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty