Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50856

Analytics ingestion rate dropped on build 7.1.0-2226

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Yes
    • CX Sprint 282

    Description

      Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD

       

      Avg Ingestion Rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, SSD

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            bo-chun.wang Bo-Chun Wang created issue -
            bo-chun.wang Bo-Chun Wang made changes -
            Field Original Value New Value
            Description *Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD*

             
            ||Build||Ingestion Rate||Job||
            |7.1.0-2179|411,228|[http://perf.jenkins.couchbase.com/job/oceanus/7974/]|
            |7.1.0-2226|229,232|[http://perf.jenkins.couchbase.com/job/oceanus/8009/]|

             

             

            *Avg Ingestion Rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, SSD*

             
            ||Build||Ingestion Rate||Job||
            |7.1.0-2225|287,168|[http://perf.jenkins.couchbase.com/job/triton_analytics/1994/]|
            |7.1.0-2226|126,550|[http://perf.jenkins.couchbase.com/job/triton_analytics/1993/]|

             

             
            *Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD*
            ||Build||Ingestion Rate||Job||
            |7.1.0-2179|411,228|[http://perf.jenkins.couchbase.com/job/oceanus/7974/]|
            |7.1.0-2226|229,232|[http://perf.jenkins.couchbase.com/job/oceanus/8009/]|

             

            *Avg Ingestion Rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, SSD*
            ||Build||Ingestion Rate||Job||
            |7.1.0-2225|287,168|[http://perf.jenkins.couchbase.com/job/triton_analytics/1994/]|
            |7.1.0-2226|126,550|[http://perf.jenkins.couchbase.com/job/triton_analytics/1993/]|

             
            till Till Westmann made changes -
            Link This issue relates to MB-47169 [ MB-47169 ]
            till Till Westmann made changes -
            Assignee Till Westmann [ till ] Bo-Chun Wang [ bo-chun.wang ]
            till Till Westmann made changes -
            Rank Ranked higher
            bo-chun.wang Bo-Chun Wang added a comment - - edited

            Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD

            Build Ingestion Rate Job  
            7.1.0-2179 411,228 http://perf.jenkins.couchbase.com/job/oceanus/7974/  
            7.1.0-2226 229,232 http://perf.jenkins.couchbase.com/job/oceanus/8009/  
            7.1.0-2226 323,405 http://perf.jenkins.couchbase.com/job/oceanus/8055/ storageDiskForceBytes=2MB
            7.1.0-2226 368,420 http://perf.jenkins.couchbase.com/job/oceanus/8056/ storageDiskForceBytes=4MB
            7.1.0-2226 395,392 http://perf.jenkins.couchbase.com/job/oceanus/8057/ storageDiskForceBytes=8MB

             

            Avg Ingestion Rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, SSD

            Build Ingestion Rate Job  
            7.1.0-2225 287,168 http://perf.jenkins.couchbase.com/job/triton_analytics/1994/  
            7.1.0-2226 126,550 http://perf.jenkins.couchbase.com/job/triton_analytics/1993/  
            7.1.0-2226 186,696 http://perf.jenkins.couchbase.com/job/triton_analytics/1997/ storageDiskForceBytes=2MB
            7.1.0-2226 253,591 http://perf.jenkins.couchbase.com/job/triton_analytics/1998/ storageDiskForceBytes=4MB
            7.1.0-2226 253,226 http://perf.jenkins.couchbase.com/job/triton_analytics/1999/ storageDiskForceBytes=8MB
            bo-chun.wang Bo-Chun Wang added a comment - - edited Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD Build Ingestion Rate Job   7.1.0-2179 411,228 http://perf.jenkins.couchbase.com/job/oceanus/7974/   7.1.0-2226 229,232 http://perf.jenkins.couchbase.com/job/oceanus/8009/   7.1.0-2226 323,405 http://perf.jenkins.couchbase.com/job/oceanus/8055/ storageDiskForceBytes=2MB 7.1.0-2226 368,420 http://perf.jenkins.couchbase.com/job/oceanus/8056/ storageDiskForceBytes=4MB 7.1.0-2226 395,392 http://perf.jenkins.couchbase.com/job/oceanus/8057/ storageDiskForceBytes=8MB   Avg Ingestion Rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, SSD Build Ingestion Rate Job   7.1.0-2225 287,168 http://perf.jenkins.couchbase.com/job/triton_analytics/1994/   7.1.0-2226 126,550 http://perf.jenkins.couchbase.com/job/triton_analytics/1993/   7.1.0-2226 186,696 http://perf.jenkins.couchbase.com/job/triton_analytics/1997/ storageDiskForceBytes=2MB 7.1.0-2226 253,591 http://perf.jenkins.couchbase.com/job/triton_analytics/1998/ storageDiskForceBytes=4MB 7.1.0-2226 253,226 http://perf.jenkins.couchbase.com/job/triton_analytics/1999/ storageDiskForceBytes=8MB
            till Till Westmann made changes -
            Labels analytics analytics triaged
            bo-chun.wang Bo-Chun Wang made changes -
            Assignee Bo-Chun Wang [ bo-chun.wang ] Till Westmann [ till ]
            bo-chun.wang Bo-Chun Wang added a comment -

            Till Westmann 

            I have tried different values. I assigned this ticket back to you. 

            bo-chun.wang Bo-Chun Wang added a comment - Till Westmann   I have tried different values. I assigned this ticket back to you. 
            murtadha.hubail Murtadha Hubail made changes -
            Assignee Till Westmann [ till ] Murtadha Hubail [ murtadha.hubail ]

            Thanks Bo-Chun Wang. We are still waiting to hear on the impact of the change on MB-47169. However, based on your investigation, we will probably revert back to 16MB if we don't see any improvements in MB-47169.

            murtadha.hubail Murtadha Hubail added a comment - Thanks Bo-Chun Wang . We are still waiting to hear on the impact of the change on MB-47169 . However, based on your investigation, we will probably revert back to 16MB if we don't see any improvements in MB-47169 .
            wayne Wayne Siu made changes -
            Labels analytics triaged analytics performance triaged
            murtadha.hubail Murtadha Hubail made changes -
            Sprint CX Sprint 282 [ 2012 ]
            murtadha.hubail Murtadha Hubail made changes -
            Rank Ranked lower
            murtadha.hubail Murtadha Hubail made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            bo-chun.wang Bo-Chun Wang made changes -
            Attachment Screen Shot 2022-02-14 at 10.28.38 AM.png [ 177581 ]
            bo-chun.wang Bo-Chun Wang made changes -
            Attachment Screen Shot 2022-02-14 at 10.32.06 AM.png [ 177582 ]
            bo-chun.wang Bo-Chun Wang made changes -
            bo-chun.wang Bo-Chun Wang made changes -
            Attachment Screen Shot 2022-02-14 at 1.31.36 PM.png [ 177584 ]
            bo-chun.wang Bo-Chun Wang made changes -
            bo-chun.wang Bo-Chun Wang added a comment -

            We are seeing higher rebalance time in HDD runs. I am confirming the regression is caused by build 2226.

            bo-chun.wang Bo-Chun Wang added a comment - We are seeing higher rebalance time in HDD runs. I am confirming the regression is caused by build 2226.
            bo-chun.wang Bo-Chun Wang made changes -
            Attachment Screen Shot 2022-02-14 at 1.31.36 PM.png [ 177584 ]
            bo-chun.wang Bo-Chun Wang made changes -
            Attachment Screen Shot 2022-02-14 at 10.32.06 AM.png [ 177582 ]
            bo-chun.wang Bo-Chun Wang made changes -
            Attachment Screen Shot 2022-02-14 at 10.28.38 AM.png [ 177581 ]
            bo-chun.wang Bo-Chun Wang added a comment -

            The rebalance time regression is caused by build 2226.
            Rebalance-in (min), 3 -> 4 nodes, BigFUN 20M users (320M docs), HDD

            bo-chun.wang Bo-Chun Wang added a comment - The rebalance time regression is caused by build 2226. Rebalance-in (min), 3 -> 4 nodes, BigFUN 20M users (320M docs), HDD Build Rebalance Time Job 7.1.0-2223 9.4 http://perf.jenkins.couchbase.com/job/oceanus/8048/ 7.1.0-2226 33.2 http://perf.jenkins.couchbase.com/job/oceanus/8104/

            Bo-Chun Wang,

            In build 7.1.0-2316, we have added a new configurable parameter called 'storageWriteRateLimit' which allows specifying the maximum bytes/second written by each Analytics partition (iodevice) in each node. We are thinking about reverting the 'storageDiskForceBytes' change and recommending using the new parameter in the test in MB-47169. However, we would like to confirm that using the new parameter is effective.

            Could you please run an ingestion experiment on the build 7.1.0-2316 or later as follows:

            • Set storageDiskForceBytes back to 16MB in bytes.
            • Set storageWriteRateLimit in bytes to a value equals:
                (80% of disks write bandwidth on each node / number of partitions on each node) 
              For example, if the disks write bandwidth is 100MB/s and each node has 8 partitions, then the value would be 80MB / 8 = 10 MB.
              This should make any write operation that exceeds 10MB on any partition to wait until it is allowed to write another 10MB.
              The goal of the experiment is to ensure that storageWriteRateLimit slows down the Analytics write rate, so the exact numbers when setting the storageWriteRateLimit value including the % of bandwidth isn't necessary right now.

            Please let me know if you have any questions.

            murtadha.hubail Murtadha Hubail added a comment - Bo-Chun Wang , In build 7.1.0-2316, we have added a new configurable parameter called 'storageWriteRateLimit' which allows specifying the maximum bytes/second written by each Analytics partition (iodevice) in each node. We are thinking about reverting the 'storageDiskForceBytes' change and recommending using the new parameter in the test in MB-47169 . However, we would like to confirm that using the new parameter is effective. Could you please run an ingestion experiment on the build 7.1.0-2316 or later as follows: Set storageDiskForceBytes back to 16MB in bytes. Set storageWriteRateLimit in bytes to a value equals:   (80% of disks write bandwidth on each node / number of partitions on each node)  For example, if the disks write bandwidth is 100MB/s and each node has 8 partitions, then the value would be 80MB / 8 = 10 MB. This should make any write operation that exceeds 10MB on any partition to wait until it is allowed to write another 10MB. The goal of the experiment is to ensure that storageWriteRateLimit slows down the Analytics write rate, so the exact numbers when setting the storageWriteRateLimit value including the % of bandwidth isn't necessary right now. Please let me know if you have any questions.
            murtadha.hubail Murtadha Hubail made changes -
            Assignee Murtadha Hubail [ murtadha.hubail ] Bo-Chun Wang [ bo-chun.wang ]
            bo-chun.wang Bo-Chun Wang added a comment -

            I finished a run on build 7.1.0-2323. In this test, each node has 5 HDD disks and 5 partitions.

            Configuring analytics path on 172.23.96.8: ['/data3/dev0', '/data4/dev0', '/data5/dev0', '/data6/dev0', '/data7/dev0']

            The write bandwidth of a HDD disk is about 115 MB/s so I set storageWriteRateLimit to 92 MB. I also set storageDiskForceBytes to 16MB.

            storageWriteRateLimit=96468992

            storageDiskForceBytes=16777216

            Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD

            http://perf.jenkins.couchbase.com/job/oceanus-dev/176/

            Avg. ingestion rate (items/sec): 179,101

            bo-chun.wang Bo-Chun Wang added a comment - I finished a run on build 7.1.0-2323. In this test, each node has 5 HDD disks and 5 partitions. Configuring analytics path on 172.23.96.8: ['/data3/dev0', '/data4/dev0', '/data5/dev0', '/data6/dev0', '/data7/dev0'] The write bandwidth of a HDD disk is about 115 MB/s so I set storageWriteRateLimit to 92 MB. I also set storageDiskForceBytes to 16MB. storageWriteRateLimit=96468992 storageDiskForceBytes=16777216 Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD http://perf.jenkins.couchbase.com/job/oceanus-dev/176/ Avg. ingestion rate (items/sec): 179,101
            bo-chun.wang Bo-Chun Wang made changes -
            Assignee Bo-Chun Wang [ bo-chun.wang ] Murtadha Hubail [ murtadha.hubail ]
            murtadha.hubail Murtadha Hubail made changes -
            Remote Link This issue links to "AsterixDB Commit (Web Link)" [ 23708 ]
            murtadha.hubail Murtadha Hubail added a comment - - edited

            Hi Bo-Chun Wang,

            I reverted the default storageDiskForceBytes back to 16MB in build 7.1.0-2333. This regression should be gone now.

             

            murtadha.hubail Murtadha Hubail added a comment - - edited Hi Bo-Chun Wang , I reverted the default storageDiskForceBytes back to 16MB in build 7.1.0-2333. This regression should be gone now.  
            murtadha.hubail Murtadha Hubail made changes -
            Assignee Murtadha Hubail [ murtadha.hubail ] Bo-Chun Wang [ bo-chun.wang ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]

            The performance is back in build 7.1.0-2333. I close this ticket.

            Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD

             

            Rebalance-in (min), 3 -> 4 nodes, BigFUN 20M users (320M docs), HDD

            bo-chun.wang Bo-Chun Wang added a comment - The performance is back in build 7.1.0-2333. I close this ticket. Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD Build Ingestion Rate Job 7.1.0-2179 411,228 http://perf.jenkins.couchbase.com/job/oceanus/7974/ 7.1.0-2226 229,232 http://perf.jenkins.couchbase.com/job/oceanus/8009/ 7.1.0-2333 403,049 http://perf.jenkins.couchbase.com/job/oceanus/8107/   Rebalance-in (min), 3 -> 4 nodes, BigFUN 20M users (320M docs), HDD Build Ingestion Rate Job 7.1.0-2223 9.4 http://perf.jenkins.couchbase.com/job/oceanus/8048/ 7.1.0-2226 33.2 http://perf.jenkins.couchbase.com/job/oceanus/8104/ 7.1.0-2333 9.7 http://perf.jenkins.couchbase.com/job/oceanus/8108/
            bo-chun.wang Bo-Chun Wang made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            murtadha.hubail Murtadha Hubail made changes -
            Rank Ranked higher

            People

              bo-chun.wang Bo-Chun Wang
              bo-chun.wang Bo-Chun Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty