Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46493

lss_rea_bytes and bytes_written do not always match perf html report

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not a Bug
    • Cheshire-Cat
    • 7.1.0
    • storage-engine
    • Untriaged
    • 1
    • Unknown

    Description

      In 7 indexes bloom filter test for build 5177, I can see that the 2 stats approximately match rbps and wbps from html report.

      rbps 300-400 MB/s
      wbps 100-150 MB/s
       
      2021-05-19T15:43:59.874-07:00 [Info] Periodic Aggregated StorageStats:
      "lss_read_bs":                   2976503344353,
      "bytes_written":                 1906122153984,
       
      2021-05-19T16:13:59.870-07:00 [Info] Periodic Aggregated StorageStats:
      "lss_read_bs":                   3528014358936,  +551511014583  306395008/s
      "bytes_written":                 2066645012480,  +160522858496  89179365/s 
      {nofomat}
       
      For 10K index incremental build test, the number seems to be off, especially for lss_read_bs.
      

      rbps read 150 MB/s
      wbps write 125 MB/s

      2021-05-23T23:02:54.037-07:00 [Info] Periodic Aggregated StorageStats:
      "lss_read_bs": 1382967959770,
      "bytes_written": 574250827776,

      2021-05-23T23:32:53.544-07:00 [Info] Periodic Aggregated StorageStats:
      "lss_read_bs": 2050042223997, 667074264227 370596813/s
      "bytes_written": 695640932352, 121390104576 67438946/s

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            jliang John Liang created issue -
            srinath.duvuru Srinath Duvuru made changes -
            Field Original Value New Value
            Fix Version/s 7.0.1 [ 17104 ]
            Fix Version/s CheshireCat.Next [ 16908 ]
            srinath.duvuru Srinath Duvuru made changes -
            Assignee John Liang [ jliang ] Akhil Mundroy [ akhil.mundroy ]
            srinath.duvuru Srinath Duvuru made changes -
            Assignee Akhil Mundroy [ akhil.mundroy ] Saptarshi Sen [ JIRAUSER25455 ]
            lynn.straus Lynn Straus made changes -
            Fix Version/s 7.0.2 [ 18012 ]
            lynn.straus Lynn Straus made changes -
            Fix Version/s 7.0.1 [ 17104 ]
            saptarshi.sen Saptarshi Sen made changes -
            Attachment image-2021-08-12-16-56-09-474.png [ 154838 ]
            saptarshi.sen Saptarshi Sen made changes -
            Attachment image-2021-08-12-16-57-41-981.png [ 154839 ]
            saptarshi.sen Saptarshi Sen made changes -
            Attachment image-2021-08-12-16-58-05-947.png [ 154840 ]
            saptarshi.sen Saptarshi Sen made changes -
            Attachment image-2021-08-12-16-58-27-196.png [ 154841 ]

            DirectIO Test:

             

            Ran a 10k perf test on hemera cluster with basically two sets of changes

                  a) change aggregate stats logging interval 1 min and

                  b) each logging interval, get uncached LSS Stats

                  c) secondary."indexer.plasma.useDirectIO".true

                  d) one minor fix in cleaner stats

             

            Test Spec:

            gsi/plasma/secondary_100M_10k_indexes_70_20_10_iud_20res_plasma_1s_1000c.test

            Toy build:

            http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/13383/couchbase-server-enterprise-7.0.0-13383-centos7.x86_64.rpm

             

            Observation:

               The overall Aggregate Read/Write Bandwidth matches with Actual Disk Metrics.

               However, I noticed two spikes in computed bandwidth in the incremental build portion. The log interval around the spike region seemed rather narrow ~(14s).

             

            http://perf.jenkins.couchbase.com/view/GSI/job/hemera/2406/consoleFull

            http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hemera_700-13383_build_secondaryindex_1f6c 

            http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hemera_700-13383_build_incrindex_6dc9

             

              Disk Metrics:

             

            Plasma Periodic Aggregate Stats:

             

             

            saptarshi.sen Saptarshi Sen added a comment - DirectIO Test:   Ran a 10k perf test on hemera cluster with basically two sets of changes       a) change aggregate stats logging interval 1 min and       b) each logging interval, get uncached LSS Stats       c) secondary."indexer.plasma.useDirectIO".true       d) one minor fix in cleaner stats   Test Spec: gsi/plasma/secondary_100M_10k_indexes_70_20_10_iud_20res_plasma_1s_1000c.test Toy build: http://latestbuilds.service.couchbase.com/builds/latestbuilds/couchbase-server/toybuilds/13383/couchbase-server-enterprise-7.0.0-13383-centos7.x86_64.rpm   Observation:    The overall Aggregate Read/Write Bandwidth matches with Actual Disk Metrics.    However, I noticed two spikes in computed bandwidth in the incremental build portion. The log interval around the spike region seemed rather narrow ~(14s).   http://perf.jenkins.couchbase.com/view/GSI/job/hemera/2406/consoleFull http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hemera_700-13383_build_secondaryindex_1f6c   http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hemera_700-13383_build_incrindex_6dc9     Disk Metrics:   Plasma Periodic Aggregate Stats:    

            Build couchbase-server-7.1.0-1156 contains plasma commit 16d5e17 with commit message:
            MB-46493: Fix Aggregate LSSCleaner Stats

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1156 contains plasma commit 16d5e17 with commit message: MB-46493 : Fix Aggregate LSSCleaner Stats
            saptarshi.sen Saptarshi Sen made changes -
            Link This issue is cloned by MB-47990 [ MB-47990 ]
            srinath.duvuru Srinath Duvuru made changes -
            Fix Version/s Neo [ 17615 ]
            Fix Version/s 7.0.2 [ 18012 ]

            I am resolving this bug based on the directIO findings. The high bandwidth reported in lss stats for 10k index test seems to be coming from page cache. Perf tests & results are documented here: https://docs.google.com/spreadsheets/d/1RuZ-fEqpALjvi8GeRlgF0puKpjGR3npMUouAQc1m-bY/edit?usp=sharing

            There is one unexplained issue (previous comment) where a few times in directIO test, the lss read bw increases significanatly (700-1000MiB/s, which is 2x the disk bandwidth 500MiB/s). Look at the code, I do not find issue with double counting. But I will open a new task to investigate the issue.

            saptarshi.sen Saptarshi Sen added a comment - I am resolving this bug based on the directIO findings. The high bandwidth reported in lss stats for 10k index test seems to be coming from page cache. Perf tests & results are documented here: https://docs.google.com/spreadsheets/d/1RuZ-fEqpALjvi8GeRlgF0puKpjGR3npMUouAQc1m-bY/edit?usp=sharing There is one unexplained issue (previous comment) where a few times in directIO test, the lss read bw increases significanatly (700-1000MiB/s, which is 2x the disk bandwidth 500MiB/s). Look at the code, I do not find issue with double counting. But I will open a new task to investigate the issue.
            saptarshi.sen Saptarshi Sen made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            saptarshi.sen Saptarshi Sen made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            saptarshi.sen Saptarshi Sen made changes -
            Resolution Not a Bug [ 10200 ]
            Status Reopened [ 4 ] Resolved [ 5 ]
            wayne Wayne Siu made changes -
            Link This issue backports to MB-47990 [ MB-47990 ]
            wayne Wayne Siu made changes -
            Link This issue is cloned by MB-47990 [ MB-47990 ]

            Closing all Duplicates, Not a Bug, Incomplete, Duplicate

            ritam.sharma Ritam Sharma added a comment - Closing all Duplicates, Not a Bug, Incomplete, Duplicate
            ritam.sharma Ritam Sharma made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

            People

              saptarshi.sen Saptarshi Sen
              jliang John Liang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  PagerDuty