Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32389

10% drop in KV throughput over time

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      This is umbrella ticket for

      https://issues.couchbase.com/browse/MB-32107
      https://issues.couchbase.com/browse/MB-32387
      https://issues.couchbase.com/browse/MB-32388

      We had at least 3 drops in max throughput over the time. As the result current numbers are about 10% lower than on early MH builds.

       

      Daily history shows the pattern pretty well:

       

      But it also visible in weekly:

       

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            drigby Dave Rigby added a comment -

            After the various linked improvements, pillowfight performance is within 5% of the throughput of 6.0.0 (in all variants):

            Marking this as Fixed.

            drigby Dave Rigby added a comment - After the various linked improvements, pillowfight performance is within 5% of the throughput of 6.0.0 (in all variants): Marking this as Fixed.

            Dave Rigby Ben Huddleston I was going through and closing tickets and when double checking this it looks like there are still similar levels of regressions between 6.0.0 and 6.5.0. Do we want to revist this issue?

            korrigan.clark Korrigan Clark added a comment - Dave Rigby Ben Huddleston  I was going through and closing tickets and when double checking this it looks like there are still similar levels of regressions between 6.0.0 and 6.5.0. Do we want to revist this issue?
            drigby Dave Rigby added a comment - - edited

            Korrigan Clark Looking at the most recent 6.5.0 vs 6.0.0 numbers I see:

            The only tests with more than a 5% regression are:

            • KV : DCP, 250M x 1KB items, DGM - this is reporting as -9.4%, which is certainly very close to the 10% "fail" threshold. However the noise on that test is quite high, so not clear if there is a significant regression or not:

              We can certainly investigate that (I would suggest a new, separate MB to not confuse things), but I think it's probably not Critical priority and hence we may not have time to resolve before GA.
            • Rebalance 3 -> 4, 100M x 1KB items, DGM, 10K mixed ops/sec - this is reporting as +21.7% (lower is better), so if accurate this is a significant regression. However, this test is very noisy:

              In fact, if we compare the previous builds' run (6.5.0-4702) against 6.0.0 we are actually 2.2% faster than 6.0.

            For this test let's monitor the next few builds and see if the high rebalance time on the current 6.5.0 build is spurious.

            drigby Dave Rigby added a comment - - edited Korrigan Clark Looking at the most recent 6.5.0 vs 6.0.0 numbers I see: The only tests with more than a 5% regression are: KV : DCP, 250M x 1KB items, DGM - this is reporting as -9.4%, which is certainly very close to the 10% "fail" threshold. However the noise on that test is quite high, so not clear if there is a significant regression or not: We can certainly investigate that (I would suggest a new, separate MB to not confuse things), but I think it's probably not Critical priority and hence we may not have time to resolve before GA. Rebalance 3 -> 4, 100M x 1KB items, DGM, 10K mixed ops/sec - this is reporting as +21.7% (lower is better), so if accurate this is a significant regression. However, this test is very noisy: In fact, if we compare the previous builds' run (6.5.0-4702) against 6.0.0 we are actually 2.2% faster than 6.0. For this test let's monitor the next few builds and see if the high rebalance time on the current 6.5.0 build is spurious.

            Dave Rigby looks like 3 of the tests are more than 10% regressed. 

            http://showfast.sc.couchbase.com/daily/#/history/KV%7CDCP,%20250M%20x%201KB%20items,%20DGM%7CAvg%20Throughput%20(items/sec)

            http://showfast.sc.couchbase.com/daily/#/history/KV%7CPillowfight,%2020/80%20R/W,%20256B%20binary%20items%7CMax%20Throughput%20(ops/sec)

            http://showfast.sc.couchbase.com/daily/#/history/KV%7CPillowfight,%2050/50%20R/W,%20256B%20binary%20items%7CMax%20Throughput%20(ops/sec)

            For Pillowfight, 20/80 R/W, 256B binary items the 7.0.0 builds have higher throughput but not the latests 6.5.0. DCP, 250M x 1KB items, DGM has gone down slowly over time with the latests runs being particularly bad. Pillowfight, 50/50 R/W, 256B binary items had . big drop recently and only a slight recovering in the most recent runs. I am reopening this ticket.

            korrigan.clark Korrigan Clark added a comment - Dave Rigby  looks like 3 of the tests are more than 10% regressed.  http://showfast.sc.couchbase.com/daily/#/history/KV%7CDCP,%20250M%20x%201KB%20items,%20DGM%7CAvg%20Throughput%20(items/sec) http://showfast.sc.couchbase.com/daily/#/history/KV%7CPillowfight,%2020/80%20R/W,%20256B%20binary%20items%7CMax%20Throughput%20(ops/sec) http://showfast.sc.couchbase.com/daily/#/history/KV%7CPillowfight,%2050/50%20R/W,%20256B%20binary%20items%7CMax%20Throughput%20(ops/sec) For Pillowfight, 20/80 R/W, 256B binary items the 7.0.0 builds have higher throughput but not the latests 6.5.0. DCP, 250M x 1KB items, DGM has gone down slowly over time with the latests runs being particularly bad. Pillowfight, 50/50 R/W, 256B binary items had . big drop recently and only a slight recovering in the most recent runs. I am reopening this ticket.
            drigby Dave Rigby added a comment -

            Hi Korrigan Clark, note there's already MBs tracking those issues individually:

            • DCP, 250M x 1KB items; DCP - MB-36826 - which is actually an issue with the test (currently assigned to yourself).
            • Pillowfight Max Throughput - MB-36827.

            I'm therefore resolving this issue, given it was originally opened for specific performance issues early on in Mad-Hatter (around build 1633) which have subsequently be resolved - you compare for example performance for a 6.5.0 build before the num_shards change (e.g. 6.5.0-4755) then everything is green:

            If there's any new performance issues then I suggest we raise a new MB for them (otherwise we might a well just have a single MB for all KV-Engine perf issues ever!)

            drigby Dave Rigby added a comment - Hi Korrigan Clark , note there's already MBs tracking those issues individually: DCP, 250M x 1KB items; DCP - MB-36826 - which is actually an issue with the test (currently assigned to yourself). Pillowfight Max Throughput - MB-36827 . I'm therefore resolving this issue, given it was originally opened for specific performance issues early on in Mad-Hatter (around build 1633) which have subsequently be resolved - you compare for example performance for a 6.5.0 build before the num_shards change (e.g. 6.5.0-4755) then everything is green: If there's any new performance issues then I suggest we raise a new MB for them (otherwise we might a well just have a single MB for all KV-Engine perf issues ever!)

            People

              korrigan.clark Korrigan Clark
              oleksandr.gyryk Alex Gyryk (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty