Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46493

lss_rea_bytes and bytes_written do not always match perf html report

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not a Bug
    • Cheshire-Cat
    • 7.1.0
    • storage-engine
    • Untriaged
    • 1
    • Unknown

    Description

      In 7 indexes bloom filter test for build 5177, I can see that the 2 stats approximately match rbps and wbps from html report.

      rbps 300-400 MB/s
      wbps 100-150 MB/s
       
      2021-05-19T15:43:59.874-07:00 [Info] Periodic Aggregated StorageStats:
      "lss_read_bs":                   2976503344353,
      "bytes_written":                 1906122153984,
       
      2021-05-19T16:13:59.870-07:00 [Info] Periodic Aggregated StorageStats:
      "lss_read_bs":                   3528014358936,  +551511014583  306395008/s
      "bytes_written":                 2066645012480,  +160522858496  89179365/s 
      {nofomat}
       
      For 10K index incremental build test, the number seems to be off, especially for lss_read_bs.
      

      rbps read 150 MB/s
      wbps write 125 MB/s

      2021-05-23T23:02:54.037-07:00 [Info] Periodic Aggregated StorageStats:
      "lss_read_bs": 1382967959770,
      "bytes_written": 574250827776,

      2021-05-23T23:32:53.544-07:00 [Info] Periodic Aggregated StorageStats:
      "lss_read_bs": 2050042223997, 667074264227 370596813/s
      "bytes_written": 695640932352, 121390104576 67438946/s

       

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-46493
          # Subject Branch Project Status CR V

          Activity

            DirectIO Test:

             

            Ran a 10k perf test on hemera cluster with basically two sets of changes

                  a) change aggregate stats logging interval 1 min and

                  b) each logging interval, get uncached LSS Stats

                  c) secondary."indexer.plasma.useDirectIO".true

                  d) one minor fix in cleaner stats

             

            Test Spec:

            gsi/plasma/secondary_100M_10k_indexes_70_20_10_iud_20res_plasma_1s_1000c.test

            Toy build:

            http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/13383/couchbase-server-enterprise-7.0.0-13383-centos7.x86_64.rpm

             

            Observation:

               The overall Aggregate Read/Write Bandwidth matches with Actual Disk Metrics.

               However, I noticed two spikes in computed bandwidth in the incremental build portion. The log interval around the spike region seemed rather narrow ~(14s).

             

            http://perf.jenkins.couchbase.com/view/GSI/job/hemera/2406/consoleFull

            http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hemera_700-13383_build_secondaryindex_1f6c 

            http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hemera_700-13383_build_incrindex_6dc9

             

              Disk Metrics:

             

            Plasma Periodic Aggregate Stats:

             

             

            saptarshi.sen Saptarshi Sen added a comment - DirectIO Test:   Ran a 10k perf test on hemera cluster with basically two sets of changes       a) change aggregate stats logging interval 1 min and       b) each logging interval, get uncached LSS Stats       c) secondary."indexer.plasma.useDirectIO".true       d) one minor fix in cleaner stats   Test Spec: gsi/plasma/secondary_100M_10k_indexes_70_20_10_iud_20res_plasma_1s_1000c.test Toy build: http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/13383/couchbase-server-enterprise-7.0.0-13383-centos7.x86_64.rpm   Observation:    The overall Aggregate Read/Write Bandwidth matches with Actual Disk Metrics.    However, I noticed two spikes in computed bandwidth in the incremental build portion. The log interval around the spike region seemed rather narrow ~(14s).   http://perf.jenkins.couchbase.com/view/GSI/job/hemera/2406/consoleFull http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hemera_700-13383_build_secondaryindex_1f6c   http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hemera_700-13383_build_incrindex_6dc9     Disk Metrics:   Plasma Periodic Aggregate Stats:    

            Build couchbase-server-7.1.0-1156 contains plasma commit 16d5e17 with commit message:
            MB-46493: Fix Aggregate LSSCleaner Stats

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1156 contains plasma commit 16d5e17 with commit message: MB-46493 : Fix Aggregate LSSCleaner Stats

            I am resolving this bug based on the directIO findings. The high bandwidth reported in lss stats for 10k index test seems to be coming from page cache. Perf tests & results are documented here: https://docs.google.com/spreadsheets/d/1RuZ-fEqpALjvi8GeRlgF0puKpjGR3npMUouAQc1m-bY/edit?usp=sharing

            There is one unexplained issue (previous comment) where a few times in directIO test, the lss read bw increases significanatly (700-1000MiB/s, which is 2x the disk bandwidth 500MiB/s). Look at the code, I do not find issue with double counting. But I will open a new task to investigate the issue.

            saptarshi.sen Saptarshi Sen added a comment - I am resolving this bug based on the directIO findings. The high bandwidth reported in lss stats for 10k index test seems to be coming from page cache. Perf tests & results are documented here: https://docs.google.com/spreadsheets/d/1RuZ-fEqpALjvi8GeRlgF0puKpjGR3npMUouAQc1m-bY/edit?usp=sharing There is one unexplained issue (previous comment) where a few times in directIO test, the lss read bw increases significanatly (700-1000MiB/s, which is 2x the disk bandwidth 500MiB/s). Look at the code, I do not find issue with double counting. But I will open a new task to investigate the issue.

            Closing all Duplicates, Not a Bug, Incomplete, Duplicate

            ritam.sharma Ritam Sharma added a comment - Closing all Duplicates, Not a Bug, Incomplete, Duplicate

            People

              saptarshi.sen Saptarshi Sen
              jliang John Liang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  PagerDuty