Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-38161

cbbackmgr backup throughput degradation (sqlite) in build 7.0.0-1466 or earlier

    XMLWordPrintable

Details

    Description

      Observing ~40 drop in cbbackupmgr backup throughput in 7.0.0-1466 . 

       

      I am also seeing increase in time in below test case. 

      Still triaging further to identify the the build which introduced the degradation .

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            james.lee James Lee added a comment -

            Hi Sharath Sulochana, Daniel Owen,

            This is something that I've already looked into and resolved on Monday. This was an issue in gocbcore where they were silently ignoring (by mistake) the 'DisableDecompression' agent config. This meant that upon receiving
            snappy compressed mutations from the server, gocbcore inflated them, then we immediately compressed them again before writing them to disk (obviously this is very expensive if we are doing it for almost every mutation).

            I submitted a patch for GOCBC-815 on Monday which was promptly merged; I've also updated the manifest so that our version of gocbcore includes the aforementioned patch. The performance testing could be triggered again to confirm that the issue is gone but I'm confident that this has been resolved.

            Thanks,
            James

            james.lee James Lee added a comment - Hi Sharath Sulochana , Daniel Owen , This is something that I've already looked into and resolved on Monday. This was an issue in gocbcore where they were silently ignoring (by mistake) the 'DisableDecompression' agent config. This meant that upon receiving snappy compressed mutations from the server, gocbcore inflated them, then we immediately compressed them again before writing them to disk (obviously this is very expensive if we are doing it for almost every mutation). I submitted a patch for GOCBC-815 on Monday which was promptly merged; I've also updated the manifest so that our version of gocbcore includes the aforementioned patch. The performance testing could be triggered again to confirm that the issue is gone but I'm confident that this has been resolved. Thanks, James

            James Lee

            throughput is back to ~362 MB/sec but this is still ~10% drop compared to 6.5.0-4960 (Mad-hatter) GA build .

            sharath.sulochana Sharath Sulochana (Inactive) added a comment - James Lee throughput is back to ~362 MB/sec but this is still ~10% drop compared to 6.5.0-4960 (Mad-hatter) GA build .
            james.lee James Lee added a comment -

            Hi Sharath Sulochana,

            If we look at the separate runs this is actually the slowest run of all three, the highest being ~381 MB/s. I don't think there is anything to worry about here for a couple of reasons:
            1) We are now writing a uint32 collection id along with every document (excluding some smart stuff that SQLite does e.g. LEB encoding) this results in about 0.5 GB (for the 100M 1024B dataset) more data which has to be written to disk (excluding the additional work created due to the combined collection, key index).
            2) Since collections support was added the SQLite file format hasn't been worked on very much (it's no longer the default file format). We have replaced it with Rift (SQLite V2) which splits user data into a binary blob while maintaining an SQLite index to that data. We have been heavily modifying the storage/DCP layers (for cloud support) which we have optimized for Rift which can be seen in the numerous performance tests that have been run on Leto. I've been working with Korrigan Clark over the past few days to get performance testing for Rift running and displayed on showfast; I believe all the ground work is in place so we should see a section for Rift performance testing appear sometime soon.

            james.lee James Lee added a comment - Hi Sharath Sulochana , If we look at the separate runs this is actually the slowest run of all three, the highest being ~381 MB/s. I don't think there is anything to worry about here for a couple of reasons: 1) We are now writing a uint32 collection id along with every document (excluding some smart stuff that SQLite does e.g. LEB encoding) this results in about 0.5 GB (for the 100M 1024B dataset) more data which has to be written to disk (excluding the additional work created due to the combined collection, key index). 2) Since collections support was added the SQLite file format hasn't been worked on very much (it's no longer the default file format). We have replaced it with Rift (SQLite V2) which splits user data into a binary blob while maintaining an SQLite index to that data. We have been heavily modifying the storage/DCP layers (for cloud support) which we have optimized for Rift which can be seen in the numerous performance tests that have been run on Leto. I've been working with Korrigan Clark over the past few days to get performance testing for Rift running and displayed on showfast; I believe all the ground work is in place so we should see a section for Rift performance testing appear sometime soon.

            Based on the fact that the performance was restored to what it was before the defect, I am closing this issue.

            asad.zaidi Asad Zaidi (Inactive) added a comment - Based on the fact that the performance was restored to what it was before the defect, I am closing this issue.

            People

              james.lee James Lee
              sharath.sulochana Sharath Sulochana (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty