Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34605

cbbackupmgr - compact speed degradation on ForestDB backups

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      Description

      We're seeing a ~100% increase in compact time between 6.0.1/6.0.2 and the latest MadHatter builds. 
       
      Test

      EE compact time elapsed (seconds), 4 nodes, 1 bucket x 100M x 1KB, DGM, Idle

      Results

      6.5.0-3274: 1,294 secs

      6.0.2-2409: 610 secs

       
      Logs for 6.0.2-2409

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-10042/leto-srv-01.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-10042/leto-srv-02.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-10042/leto-srv-03.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-10042/leto-srv-04.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-10042/tools.zip

       
      Logs for 6.5.0-3274

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-tmp1-52/leto-srv-01.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-tmp1-52/leto-srv-02.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-tmp1-52/leto-srv-03.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-tmp1-52/leto-srv-04.perf.couchbase.com.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-leto-tmp1-52/tools.zip

      Comments

      I would imagine this is to do with us having to open many more ForestDB files due to sharding changes. The logs indicate that throughout compact's runtime it is opening and closing many more ForestDB files than in prior versions. 

      It should also be noted that this increase in compact time is offset by a large reduction in base backup size thanks to improvements in backup.
       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          toby.wilds Toby Wilds created issue -

          I think this is due to the fact that atm we are compacting all shards at the same time so it could be an issue with having to many threads, we could add a --threads to control how many files are compacted at once

          carlos.gonzalez Carlos Gonzalez Betancort (Inactive) added a comment - I think this is due to the fact that atm we are compacting all shards at the same time so it could be an issue with having to many threads, we could add a --threads to control how many files are compacted at once
          carlos.gonzalez Carlos Gonzalez Betancort (Inactive) made changes -
          Field Original Value New Value
          Assignee Patrick Varley [ pvarley ] Carlos Gonzalez Betancort [ carlos.gonzalez ]
          carlos.gonzalez Carlos Gonzalez Betancort (Inactive) made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          carlos.gonzalez Carlos Gonzalez Betancort (Inactive) made changes -
          Actual Start 2019-06-14 07:59 (issue has been started)
          owend Daniel Owen added a comment -

          Hi Carlos Gonzalez Betancort Given that we are not going to use forestDB - can we set as "Wont-do"?

          owend Daniel Owen added a comment - Hi Carlos Gonzalez Betancort Given that we are not going to use forestDB - can we set as "Wont-do"?

          Daniel Owen the issue has to do with the over use of threads, in 6.0 compaction used a goroutine per shard, this was fine for when we only had a couple shards, know that we have 1024 it causes cbbackupmgr to use up all resources in the machine and is more noticable in forestDB than SQLITE but sitll I have made a fix to manualy controll parallelism that should help SQLite as well as forestDB, hopefully it will be merged in soon.

          carlos.gonzalez Carlos Gonzalez Betancort (Inactive) added a comment - Daniel Owen the issue has to do with the over use of threads, in 6.0 compaction used a goroutine per shard, this was fine for when we only had a couple shards, know that we have 1024 it causes cbbackupmgr to use up all resources in the machine and is more noticable in forestDB than SQLITE but sitll I have made a fix to manualy controll parallelism that should help SQLite as well as forestDB, hopefully it will be merged in soon.
          carlos.gonzalez Carlos Gonzalez Betancort (Inactive) made changes -
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Resolved [ 5 ]
          carlos.gonzalez Carlos Gonzalez Betancort (Inactive) made changes -
          Actual End 2019-06-17 11:09 (issue has been resolved)

          Build couchbase-server-6.5.0-3524 contains backup commit a7ee099 with commit message:
          MB-34605 Add threads flag to compact

          build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-3524 contains backup commit a7ee099 with commit message: MB-34605 Add threads flag to compact
          wayne Wayne Siu added a comment -

          A regression in compact time on SQLite also observed.

          http://showfast.sc.couchbase.com/#/timeline/Linux/tools/compact/SQLite

          6.5.0-3334 : 1.5 sec

          6.5.0-3511 : 53 sec.

          wayne Wayne Siu added a comment - A regression in compact time on SQLite also observed. http://showfast.sc.couchbase.com/#/timeline/Linux/tools/compact/SQLite 6.5.0-3334 : 1.5 sec 6.5.0-3511 : 53 sec.
          toby.wilds Toby Wilds added a comment -

          Hi Wayne Siu, sorry, I should have mentioned - compact in SQLite has never worked correctly pre build 6.5.0-3426, which is why these tests record such small times. I'll delete this old data because it is misleading.

          toby.wilds Toby Wilds added a comment - Hi Wayne Siu , sorry, I should have mentioned - compact in SQLite has never worked correctly pre build 6.5.0-3426, which is why these tests record such small times. I'll delete this old data because it is misleading.
          wayne Wayne Siu added a comment -

          Recent build(s) : 6.5.0-3573, 3633, and 3687, the compact time is about 180 sec (compared to 600 sec in 6.0.2 and 1200 sec in earlier MH builds).

          Marking it CLOSED.

          wayne Wayne Siu added a comment - Recent build(s) : 6.5.0-3573, 3633, and 3687, the compact time is about 180 sec (compared to 600 sec in 6.0.2 and 1200 sec in earlier MH builds). Marking it CLOSED.
          wayne Wayne Siu made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

          People

            carlos.gonzalez Carlos Gonzalez Betancort (Inactive)
            toby.wilds Toby Wilds
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty