Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35613

cbbackupmgr: Restore performance degradation of ~40% between 6.5.0-4000 and 6.5.0-4064

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 6.5.0
    • 6.5.0
    • tools
    • Triaged
    • Yes

    Description

      I'm seeing a degradation of cbbackupmgr restore throughput of around 40% across both ForestDB and SQLite.

      From an initial analysis, looks like this performance degradation is cbbackumpgr side as resource utilisation and write queues Couchbase Server side are lower than in previous runs, so it seems that throughput driven by cbbackupmgr is lower.

      Ops/s shows a significant drop between 6.5.0-4000 and 6.5.0-4064:

      Runhttp://perf.jenkins.couchbase.com/job/leto/10552/
      cbmonitor report: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=leto_650-4064_restore_0071

      Attached are client and server side logs for an SQLite run.

      Attachments

        1. disk-size-4000.png
          disk-size-4000.png
          670 kB
        2. disk-size-4169.png
          disk-size-4169.png
          690 kB
        3. image-2019-08-20-12-31-16-398.png
          image-2019-08-20-12-31-16-398.png
          154 kB
        4. image-2019-08-20-12-32-03-152.png
          image-2019-08-20-12-32-03-152.png
          149 kB
        5. leto-srv-01.perf.couchbase.com.zip
          26.97 MB
        6. leto-srv-02.perf.couchbase.com.zip
          28.70 MB
        7. leto-srv-03.perf.couchbase.com.zip
          28.89 MB
        8. leto-srv-04.perf.couchbase.com.zip
          27.84 MB
        9. Screenshot 2019-09-09 at 13.43.43.png
          Screenshot 2019-09-09 at 13.43.43.png
          311 kB
        10. Screenshot 2019-09-12 at 18.00.02.png
          Screenshot 2019-09-12 at 18.00.02.png
          47 kB
        11. Screenshot 2019-09-12 at 18.03.03.png
          Screenshot 2019-09-12 at 18.03.03.png
          51 kB
        12. Screenshot 2019-09-12 at 18.03.23.png
          Screenshot 2019-09-12 at 18.03.23.png
          51 kB
        13. Screenshot 2019-09-19 at 16.30.21.png
          Screenshot 2019-09-19 at 16.30.21.png
          47 kB
        14. SET_WITH_META-Timings-4000.png
          SET_WITH_META-Timings-4000.png
          51 kB
        15. SET_WITH_META-Timings-4169.png
          SET_WITH_META-Timings-4169.png
          53 kB
        16. tools.zip
          25 kB
        For Gerrit Dashboard: MB-35613
        # Subject Branch Project Status CR V

        Activity

          So build 4031 contained a bug that made some race conditions leading to backups/restores hanging or taking a lo9ng time to start or close this is an issue that was fixed later on the earliest build post 6.5.0-4000 that does not contain issues would be 6.5.0-4065

          carlos.gonzalez Carlos Gonzalez Betancort (Inactive) added a comment - So build 4031 contained a bug that made some race conditions leading to backups/restores hanging or taking a lo9ng time to start or close this is an issue that was fixed later on the earliest build post 6.5.0-4000 that does not contain issues would be 6.5.0-4065

          Had discussion with James Lee and Carlos Gonzalez Betancort in person. More investigation will be done from the tools/restore side of things as no KV changes are present and the fragmentation is not expected to cause any slow down with the given workload.

          ben.huddleston Ben Huddleston added a comment - Had discussion with James Lee and Carlos Gonzalez Betancort in person. More investigation will be done from the tools/restore side of things as no KV changes are present and the fragmentation is not expected to cause any slow down with the given workload.

          Increasing the CB_MAX_DOCS_BUFFERED gets the performance back to where it was before. Need to decide what is a sane value and if it should be dynamic base on memory size available.

          pvarley Patrick Varley added a comment - Increasing the CB_MAX_DOCS_BUFFERED gets the performance back to where it was before. Need to decide what is a sane value and if it should be dynamic base on memory size available.

          Build couchbase-server-6.5.0-4318 contains backup commit b12665c with commit message:
          MB-35613 Increase max docs buffer size

          build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4318 contains backup commit b12665c with commit message: MB-35613 Increase max docs buffer size

          It took us a while but we pin pointed the issue to a shared channels whose size was to small we fix this issue and is now faster, we triggered a manual run of the SQLite restore test and as can be seen bellow is much faster 304 versus 244 MB/s

          carlos.gonzalez Carlos Gonzalez Betancort (Inactive) added a comment - - edited It took us a while but we pin pointed the issue to a shared channels whose size was to small we fix this issue and is now faster, we triggered a manual run of the SQLite restore test and as can be seen bellow is much faster 304 versus 244 MB/s

          People

            carlos.gonzalez Carlos Gonzalez Betancort (Inactive)
            toby.wilds Toby Wilds
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty