Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49851

Plasma DGM throughput regressed

    XMLWordPrintable

Details

    • 1

    Attachments

      Issue Links

        For Gerrit Dashboard: MB-49851
        # Subject Branch Project Status CR V

        Activity

          jliang John Liang added a comment - - edited

          I had. a run on build 1745 with some of the latest compression setting turned off. The throughput is back to 152.8K for 20% RR.

          http://perf.jenkins.couchbase.com/job/secondary-plasma-dgm/5070/

          secondary."indexer.plasma.mainIndex.enableCompressDuringBurst".false secondary."indexer.plasma.backIndex.enableCompressDuringBurst".false secondary."indexer.plasma.mainIndex.compressBeforeEvictPercent".0 secondary."indexer.plasma.backIndex.compressBeforeEvictPercent".0 secondary."indexer.plasma.mainIndex.enableDecompressDuringSwapin".true secondary."indexer.plasma.backIndex.enableDecompressDuringSwapin".true
          

          jliang John Liang added a comment - - edited I had. a run on build 1745 with some of the latest compression setting turned off. The throughput is back to 152.8K for 20% RR. http://perf.jenkins.couchbase.com/job/secondary-plasma-dgm/5070/ secondary."indexer.plasma.mainIndex.enableCompressDuringBurst".false secondary."indexer.plasma.backIndex.enableCompressDuringBurst".false secondary."indexer.plasma.mainIndex.compressBeforeEvictPercent".0 secondary."indexer.plasma.backIndex.compressBeforeEvictPercent".0 secondary."indexer.plasma.mainIndex.enableDecompressDuringSwapin".true secondary."indexer.plasma.backIndex.enableDecompressDuringSwapin".true

          Build couchbase-server-7.1.0-1836 contains indexing commit badb98d with commit message:
          MB-49851: Revert "MB-48897: Enable plasma in-mem compression configs"

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1836 contains indexing commit badb98d with commit message: MB-49851 : Revert " MB-48897 : Enable plasma in-mem compression configs"
          jliang John Liang added a comment -

          The test continues to regress on build 1887. I have run the following additional tests:
          1) A toy build without any swapper changes since 1885. The throughput continues to regress. (http://perf.jenkins.couchbase.com/view/GSI/job/secondary-plasma-dev/102/)
          2) Use 1887 but turn off compression. The throughput is back to where it was. (http://perf.jenkins.couchbase.com/view/GSI/job/secondary-plasma-dev/105/)

          This proves that the regression between 1885 and 1887 is due to the latency induced by decompression during scan. When I look at the latency graph, the 50 percentile latency for 1885 is 350us. In 1887, it is 400us. There is a 15% increase in 1887. That corresponds to the roughly 14% drop in throughput.

          Also, comparing the latency graph between 1695 and 1885, there is a slight increase in 50 percentile latency in 1885, but it is hard to quantify the exact amount from the graph. Having said that, the throughput difference is less than 10%. So it is still within bound of error.

          jliang John Liang added a comment - The test continues to regress on build 1887. I have run the following additional tests: 1) A toy build without any swapper changes since 1885. The throughput continues to regress. ( http://perf.jenkins.couchbase.com/view/GSI/job/secondary-plasma-dev/102/ ) 2) Use 1887 but turn off compression. The throughput is back to where it was. ( http://perf.jenkins.couchbase.com/view/GSI/job/secondary-plasma-dev/105/ ) This proves that the regression between 1885 and 1887 is due to the latency induced by decompression during scan. When I look at the latency graph, the 50 percentile latency for 1885 is 350us. In 1887, it is 400us. There is a 15% increase in 1887. That corresponds to the roughly 14% drop in throughput. Also, comparing the latency graph between 1695 and 1885, there is a slight increase in 50 percentile latency in 1885, but it is hard to quantify the exact amount from the graph. Having said that, the throughput difference is less than 10%. So it is still within bound of error.
          jliang John Liang added a comment - - edited

          I ran it with build 1941 (using Zstd). The perf number is slightly better than 1887 but not by much. It is about 126K.

          http://perf.jenkins.couchbase.com/view/GSI/job/secondary-plasma-dev/107/

          jliang John Liang added a comment - - edited I ran it with build 1941 (using Zstd). The perf number is slightly better than 1887 but not by much. It is about 126K. http://perf.jenkins.couchbase.com/view/GSI/job/secondary-plasma-dev/107/
          jliang John Liang added a comment -

          This test scans only 1 page. This effectively tests scan performance when the working set is all in memory. Unlike a non-DGM test, this test takes into account of effect of throttling during scan when memory usage is above quota (DGM). In 7.0, the throughput is 143K. With Zstd, the throughput is 126K. There is about a 12% degradation – mainly due to increase of average latency because of decompression cost. Having said that, this number is still 23% better than 6.6.4 (102K).

          What is more important is in the new test (100/20 test with 10% RR), the throughput is at 146K (see http://perf.jenkins.couchbase.com/job/hemera/3461/). This new test also test 100% RR scan without oversizing quota – it just allocate enough quota for 10% RR. We see this performance is on par with 7.0 and within 5-6% of 7.1 run (155K). This test is more representative of real world workload.

          When working set is not fully in memory, we can see in-memory compression is highly effective. It can be 150% better than uncompressed data (see the new 80/20 scan test). This is important in real world use case when it is difficult to size the working set perfectly:
          1) Memory fragmentation can decrease effective memory quota during execution – which leads to hot working set not being memory resident.
          2) Hot working set fluctuates during the day or throughput the week.

          jliang John Liang added a comment - This test scans only 1 page. This effectively tests scan performance when the working set is all in memory. Unlike a non-DGM test, this test takes into account of effect of throttling during scan when memory usage is above quota (DGM). In 7.0, the throughput is 143K. With Zstd, the throughput is 126K. There is about a 12% degradation – mainly due to increase of average latency because of decompression cost. Having said that, this number is still 23% better than 6.6.4 (102K). What is more important is in the new test (100/20 test with 10% RR), the throughput is at 146K (see http://perf.jenkins.couchbase.com/job/hemera/3461/ ). This new test also test 100% RR scan without oversizing quota – it just allocate enough quota for 10% RR. We see this performance is on par with 7.0 and within 5-6% of 7.1 run (155K). This test is more representative of real world workload. When working set is not fully in memory, we can see in-memory compression is highly effective. It can be 150% better than uncompressed data (see the new 80/20 scan test). This is important in real world use case when it is difficult to size the working set perfectly: 1) Memory fragmentation can decrease effective memory quota during execution – which leads to hot working set not being memory resident. 2) Hot working set fluctuates during the day or throughput the week.
          jliang John Liang added a comment -

          For now, we are not going to fix the regression, since it is only 12% lower than 7.0. But with compression, there is significant performance improvement:
          1) memory saving can be 3x
          2) DGM with hot/cold mix scan can be 1.5x

          jliang John Liang added a comment - For now, we are not going to fix the regression, since it is only 12% lower than 7.0. But with compression, there is significant performance improvement: 1) memory saving can be 3x 2) DGM with hot/cold mix scan can be 1.5x

          People

            akhil.mundroy Akhil Mundroy
            vikas.chaudhary Vikas Chaudhary
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty