Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50361

15% drop in AG6 throughput

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Unknown

    Attachments

      For Gerrit Dashboard: MB-50361
      # Subject Branch Project Status CR V

      Activity

        korrigan.clark Korrigan Clark created issue -
        wayne Wayne Siu made changes -
        Field Original Value New Value
        Affects Version/s Cheshire-Cat [ 15915 ]
        Affects Version/s Neo [ 17615 ]
        wayne Wayne Siu made changes -
        Fix Version/s CheshireCat.Next [ 16908 ]
        Fix Version/s Neo [ 17615 ]
        wayne Wayne Siu made changes -
        Labels performace query performance query
        wayne Wayne Siu made changes -
        Priority Major [ 3 ] Critical [ 2 ]
        jeelan.poola Jeelan Poola made changes -
        Assignee Jeelan Poola [ jeelan.poola ] Varun Velamuri [ varun.velamuri ]

        Srinath Duvuru , From the change log, it appears that the regression is caused due to hole cleaning logic introduced in plasma from build 7.1.0-2007. To confirm the same, I re-ran the test with a toy build with indexing on 7.1.0-2006 + plasma support for hole cleaning - The regression showed up (http://perf.jenkins.couchbase.com/job/arke-multi-client/6286/console) confirming that the regression is due to hole cleaning logic. The toy build parameters are: http://server.jenkins.couchbase.com/view/Toys/job/toy-unix-simple/4267/parameters/

        Another point is, I ran another test with "indexer.plasma.holecleaner.enabled" set to false on top of build 7.1.0-2007 but the regression still showed up - http://perf.jenkins.couchbase.com/job/arke-multi-client/6277/console. I am not sure if the settings change is being dealt with as expected.

        Moving this MB to storage-engine (plasma) for further analysis

        varun.velamuri Varun Velamuri added a comment - Srinath Duvuru , From the change log, it appears that the regression is caused due to hole cleaning logic introduced in plasma from build 7.1.0-2007. To confirm the same, I re-ran the test with a toy build with indexing on 7.1.0-2006 + plasma support for hole cleaning - The regression showed up ( http://perf.jenkins.couchbase.com/job/arke-multi-client/6286/console) confirming that the regression is due to hole cleaning logic. The toy build parameters are: http://server.jenkins.couchbase.com/view/Toys/job/toy-unix-simple/4267/parameters/ Another point is, I ran another test with "indexer.plasma.holecleaner.enabled" set to false on top of build 7.1.0-2007 but the regression still showed up - http://perf.jenkins.couchbase.com/job/arke-multi-client/6277/console. I am not sure if the settings change is being dealt with as expected. Moving this MB to storage-engine (plasma) for further analysis
        varun.velamuri Varun Velamuri made changes -
        Assignee Varun Velamuri [ varun.velamuri ] Srinath Duvuru [ srinath.duvuru ]
        varun.velamuri Varun Velamuri made changes -
        Component/s secondary-index [ 11211 ]
        varun.velamuri Varun Velamuri made changes -
        Labels performance query performance plasma query
        srinath.duvuru Srinath Duvuru made changes -
        Assignee Srinath Duvuru [ srinath.duvuru ] Saptarshi Sen [ JIRAUSER25455 ]
        jliang John Liang added a comment -

        Saptarshi SenIs there any expiry or delete during query? Thanks.

        jliang John Liang added a comment - Saptarshi Sen Is there any expiry or delete during query? Thanks.
        saptarshi.sen Saptarshi Sen added a comment - - edited

        From the spec in the console log, ttl is 0. So the workload should not be related to expiry.

         

        15:28:05     "n1ql_op": "read",15:28:05     "n1ql_queries": [15:28:05         {15:28:05             "args": "[\"\{gender}\", \"\{marital_status}\", \"\{education_status}\"]",15:28:05             "scan_consistency": "not_bounded",15:28:05             "statement": "SELECT item.id, AVG(quantity) AS avg_quantity, AVG(list_price) AS avg_list_price, AVG(coupon_amt) AS avg_coupon_amt, AVG(sales_price) AS avg_sales_price FROM `bucket-1` WHERE customer.gender = $1 AND customer.marital_status = $2 AND customer.education_status = $3 GROUP BY item.id ORDER BY item.id LIMIT 100;"15:28:05         }15:28:05     ],
        15:28:05  "transactionsenabled": 0, 
        15:28:05  "transactionupdateproportion": 0.75, 
        ...
         
        *15:28:05*  "ttl": 0, 
        15:28:05  "updates": 0, 
        {*}1{*}5:28:05  "use_backup": false, 
        15:28:05  "user_mod_throughput": Infinity, 
        15:28:05  "user_mod_workers": 0,
        

        I took a look at the logs for indexer node 172.23.97.19. From Plasma Stats, from count & items_count stat, it looks like there are only inserts

        "MainStore":
        {
        "memory_quota":         283115520000,
         "count":                100000000,
        ....
        "items_count":          100000000,
        "total_records":        100000000,
        ...
        "resident_ratio":       1.00000,
        "compressed_percent":   71,
        "compressed_compressible_percent":99,
        "mvcc_purge_ratio":     1.00000,
        "reader_purge_cnt":     0,
        "reader_emptypage_cnt": 0,
        ...
        "hole_cleaner_status":  "",
        }
        

        Based on "reader_emptypage_count", the count for empty page is zero, and hole cleaner does not seem to run for the index.

        2022-01-11T15:41:03.837-08:00 [Info] Periodic Aggregated StorageStats:
        {
        ...
        "reader_purge_count":            0,
        "reader_emptypage_count":        0
        }
        

        saptarshi.sen Saptarshi Sen added a comment - - edited From the spec in the console log, ttl is 0. So the workload should not be related to expiry.   15:28:05 "n1ql_op": "read",15:28:05 "n1ql_queries": [15:28:05 {15:28:05 "args": "[\"\{gender}\", \"\{marital_status}\", \"\{education_status}\"]",15:28:05 "scan_consistency": "not_bounded",15:28:05 "statement": "SELECT item.id, AVG(quantity) AS avg_quantity, AVG(list_price) AS avg_list_price, AVG(coupon_amt) AS avg_coupon_amt, AVG(sales_price) AS avg_sales_price FROM `bucket-1` WHERE customer.gender = $1 AND customer.marital_status = $2 AND customer.education_status = $3 GROUP BY item.id ORDER BY item.id LIMIT 100;"15:28:05 }15:28:05 ], 15:28:05 "transactionsenabled": 0, 15:28:05 "transactionupdateproportion": 0.75, ...   *15:28:05* "ttl": 0, 15:28:05 "updates": 0, {*}1{*}5:28:05 "use_backup": false, 15:28:05 "user_mod_throughput": Infinity, 15:28:05 "user_mod_workers": 0, I took a look at the logs for indexer node 172.23.97.19. From Plasma Stats, from count & items_count stat, it looks like there are only inserts "MainStore": { "memory_quota": 283115520000, "count": 100000000, .... "items_count": 100000000, "total_records": 100000000, ... "resident_ratio": 1.00000, "compressed_percent": 71, "compressed_compressible_percent":99, "mvcc_purge_ratio": 1.00000, "reader_purge_cnt": 0, "reader_emptypage_cnt": 0, ... "hole_cleaner_status": "", } Based on "reader_emptypage_count", the count for empty page is zero, and hole cleaner does not seem to run for the index. 2022-01-11T15:41:03.837-08:00 [Info] Periodic Aggregated StorageStats: { ... "reader_purge_count": 0, "reader_emptypage_count": 0 }

        Korrigan Clark Looks the performance is back up. There is some fluctuation in this test. Saptarshi Sen It does not look like a storage issue, can you confirm? Thanks.

        srinath.duvuru Srinath Duvuru added a comment - Korrigan Clark Looks the performance is back up. There is some fluctuation in this test. Saptarshi Sen It does not look like a storage issue, can you confirm? Thanks.
        saptarshi.sen Saptarshi Sen added a comment - - edited

        I reverted all the commits related to holeCleaner and ran the test with a toy build (7.1.0-14332)

        a) https://review.couchbase.org/c/plasma/+/169280 [plasma]
        b) https://review.couchbase.org/c/plasma/+/169281 [plasma]
        c https://review.couchbase.org/c/plasma/+/169293 [plasma]
        d) https://review.couchbase.org/c/indexing/+/169269 [indexer config]

        Toy Build
        Parameters: http://server.jenkins.couchbase.com/view/Toys/job/toy-unix-simple/4332/parameters/

        Perf Test : http://perf.jenkins.couchbase.com/job/arke/9862/console
        I still observed a similar query/sec number (1637) in the test as to what was reported for the build 2007 (1647).

        So it does not seem to be related to the holeCleaner changes.

        Reference Notes:

        build 2007 (indexer stats)

        "bucket-1:ag6:timings/n1ql_expr_eval":"988095 0 0","bucket-1:ag6:num_requests_range":0,"bucket-1:ag6:num_completed_requests_range":0,"bucket-1:ag6:num_rows_returned_range":0,"bucket-1:ag6:num_rows_scanned_range":0,"bucket-1:ag6:scan_cache_hit_range":0,"bucket-1:ag6:num_requests_aggr":988095,"bucket-1:ag6:num_completed_requests_aggr":988095,"bucket-1:ag6:num_rows_returned_aggr":98809500,"bucket-1:ag6:num_rows_scanned_aggr":1388228644,"bucket-1:ag6:scan_cache_hit_aggr":0,"10857409829870475256:completion_progress":4636737291354636288,
         

        build 2006 (indexer stats)

        "bucket-1:ag6:timings/n1ql_expr_eval":"1236532 0 0","bucket-1:ag6:num_requests_range":0,"bucket-1:ag6:num_completed_requests_range":0,"bucket-1:ag6:num_rows_returned_range":0,"bucket-1:ag6:num_rows_scanned_range":0,"bucket-1:ag6:scan_cache_hit_range":0,"bucket-1:ag6:num_requests_aggr":1236532,"bucket-1:ag6:num_completed_requests_aggr":1236532,"bucket-1:ag6:num_rows_returned_aggr":123653200,"bucket-1:ag6:num_rows_scanned_aggr":1737315043,"bucket-1:ag6:scan_cache_hit_aggr":0,"86388607918421446:completion_progress":4636737291354636288,
        

        saptarshi.sen Saptarshi Sen added a comment - - edited I reverted all the commits related to holeCleaner and ran the test with a toy build (7.1.0-14332) a) https://review.couchbase.org/c/plasma/+/169280 [plasma] b) https://review.couchbase.org/c/plasma/+/169281 [plasma] c https://review.couchbase.org/c/plasma/+/169293 [plasma] d) https://review.couchbase.org/c/indexing/+/169269 [indexer config] Toy Build Parameters: http://server.jenkins.couchbase.com/view/Toys/job/toy-unix-simple/4332/parameters/ Perf Test : http://perf.jenkins.couchbase.com/job/arke/9862/console I still observed a similar query/sec number (1637) in the test as to what was reported for the build 2007 (1647). So it does not seem to be related to the holeCleaner changes. Reference Notes: build 2007 (indexer stats) "bucket-1:ag6:timings/n1ql_expr_eval":"988095 0 0","bucket-1:ag6:num_requests_range":0,"bucket-1:ag6:num_completed_requests_range":0,"bucket-1:ag6:num_rows_returned_range":0,"bucket-1:ag6:num_rows_scanned_range":0,"bucket-1:ag6:scan_cache_hit_range":0,"bucket-1:ag6:num_requests_aggr":988095,"bucket-1:ag6:num_completed_requests_aggr":988095,"bucket-1:ag6:num_rows_returned_aggr":98809500,"bucket-1:ag6:num_rows_scanned_aggr":1388228644,"bucket-1:ag6:scan_cache_hit_aggr":0,"10857409829870475256:completion_progress":4636737291354636288, build 2006 (indexer stats) "bucket-1:ag6:timings/n1ql_expr_eval":"1236532 0 0","bucket-1:ag6:num_requests_range":0,"bucket-1:ag6:num_completed_requests_range":0,"bucket-1:ag6:num_rows_returned_range":0,"bucket-1:ag6:num_rows_scanned_range":0,"bucket-1:ag6:scan_cache_hit_range":0,"bucket-1:ag6:num_requests_aggr":1236532,"bucket-1:ag6:num_completed_requests_aggr":1236532,"bucket-1:ag6:num_rows_returned_aggr":123653200,"bucket-1:ag6:num_rows_scanned_aggr":1737315043,"bucket-1:ag6:scan_cache_hit_aggr":0,"86388607918421446:completion_progress":4636737291354636288,
        saptarshi.sen Saptarshi Sen made changes -
        Assignee Saptarshi Sen [ JIRAUSER25455 ] Korrigan Clark [ korrigan.clark ]
        jliang John Liang made changes -
        Component/s storage-engine [ 10175 ]
        Component/s secondary-index [ 11211 ]
        jliang John Liang made changes -
        Assignee Korrigan Clark [ korrigan.clark ] Jeelan Poola [ jeelan.poola ]
        jliang John Liang added a comment -

        Jeelan PoolaAssign back to you for now since the same regression happens even after we comment out the hole cleaning code.

        jliang John Liang added a comment - Jeelan Poola Assign back to you for now since the same regression happens even after we comment out the hole cleaning code.
        jliang John Liang added a comment - - edited

        Btw, the latest AG6 throughput is back to 2562 in build 7.1.0-2123.

        Korrigan Clark It may be a good idea to do a couple of runs with build 2123 to confirm the result.

        jliang John Liang added a comment - - edited Btw, the latest AG6 throughput is back to 2562 in build 7.1.0-2123. Korrigan Clark It may be a good idea to do a couple of runs with build 2123 to confirm the result.
        jeelan.poola Jeelan Poola made changes -
        Assignee Jeelan Poola [ jeelan.poola ] Varun Velamuri [ varun.velamuri ]

        Ive kicked off two confirmation runs

        korrigan.clark Korrigan Clark added a comment - Ive kicked off two confirmation runs

        runs look good... closing

        korrigan.clark Korrigan Clark added a comment - runs look good... closing
        korrigan.clark Korrigan Clark made changes -
        Resolution Fixed [ 1 ]
        Status Open [ 1 ] Resolved [ 5 ]
        korrigan.clark Korrigan Clark made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

        People

          varun.velamuri Varun Velamuri
          korrigan.clark Korrigan Clark
          Votes:
          0 Vote for this issue
          Watchers:
          8 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved:

            Gerrit Reviews

              There are 2 open Gerrit changes

              PagerDuty