Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50856

Analytics ingestion rate dropped on build 7.1.0-2226

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Yes
    • CX Sprint 282

    Description

      Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD

       

      Avg Ingestion Rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, SSD

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            bo-chun.wang Bo-Chun Wang added a comment -

            The rebalance time regression is caused by build 2226.
            Rebalance-in (min), 3 -> 4 nodes, BigFUN 20M users (320M docs), HDD

            bo-chun.wang Bo-Chun Wang added a comment - The rebalance time regression is caused by build 2226. Rebalance-in (min), 3 -> 4 nodes, BigFUN 20M users (320M docs), HDD Build Rebalance Time Job 7.1.0-2223 9.4 http://perf.jenkins.couchbase.com/job/oceanus/8048/ 7.1.0-2226 33.2 http://perf.jenkins.couchbase.com/job/oceanus/8104/

            Bo-Chun Wang,

            In build 7.1.0-2316, we have added a new configurable parameter called 'storageWriteRateLimit' which allows specifying the maximum bytes/second written by each Analytics partition (iodevice) in each node. We are thinking about reverting the 'storageDiskForceBytes' change and recommending using the new parameter in the test in MB-47169. However, we would like to confirm that using the new parameter is effective.

            Could you please run an ingestion experiment on the build 7.1.0-2316 or later as follows:

            • Set storageDiskForceBytes back to 16MB in bytes.
            • Set storageWriteRateLimit in bytes to a value equals:
                (80% of disks write bandwidth on each node / number of partitions on each node) 
              For example, if the disks write bandwidth is 100MB/s and each node has 8 partitions, then the value would be 80MB / 8 = 10 MB.
              This should make any write operation that exceeds 10MB on any partition to wait until it is allowed to write another 10MB.
              The goal of the experiment is to ensure that storageWriteRateLimit slows down the Analytics write rate, so the exact numbers when setting the storageWriteRateLimit value including the % of bandwidth isn't necessary right now.

            Please let me know if you have any questions.

            murtadha.hubail Murtadha Hubail added a comment - Bo-Chun Wang , In build 7.1.0-2316, we have added a new configurable parameter called 'storageWriteRateLimit' which allows specifying the maximum bytes/second written by each Analytics partition (iodevice) in each node. We are thinking about reverting the 'storageDiskForceBytes' change and recommending using the new parameter in the test in MB-47169 . However, we would like to confirm that using the new parameter is effective. Could you please run an ingestion experiment on the build 7.1.0-2316 or later as follows: Set storageDiskForceBytes back to 16MB in bytes. Set storageWriteRateLimit in bytes to a value equals:   (80% of disks write bandwidth on each node / number of partitions on each node)  For example, if the disks write bandwidth is 100MB/s and each node has 8 partitions, then the value would be 80MB / 8 = 10 MB. This should make any write operation that exceeds 10MB on any partition to wait until it is allowed to write another 10MB. The goal of the experiment is to ensure that storageWriteRateLimit slows down the Analytics write rate, so the exact numbers when setting the storageWriteRateLimit value including the % of bandwidth isn't necessary right now. Please let me know if you have any questions.
            bo-chun.wang Bo-Chun Wang added a comment -

            I finished a run on build 7.1.0-2323. In this test, each node has 5 HDD disks and 5 partitions.

            Configuring analytics path on 172.23.96.8: ['/data3/dev0', '/data4/dev0', '/data5/dev0', '/data6/dev0', '/data7/dev0']

            The write bandwidth of a HDD disk is about 115 MB/s so I set storageWriteRateLimit to 92 MB. I also set storageDiskForceBytes to 16MB.

            storageWriteRateLimit=96468992

            storageDiskForceBytes=16777216

            Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD

            http://perf.jenkins.couchbase.com/job/oceanus-dev/176/

            Avg. ingestion rate (items/sec): 179,101

            bo-chun.wang Bo-Chun Wang added a comment - I finished a run on build 7.1.0-2323. In this test, each node has 5 HDD disks and 5 partitions. Configuring analytics path on 172.23.96.8: ['/data3/dev0', '/data4/dev0', '/data5/dev0', '/data6/dev0', '/data7/dev0'] The write bandwidth of a HDD disk is about 115 MB/s so I set storageWriteRateLimit to 92 MB. I also set storageDiskForceBytes to 16MB. storageWriteRateLimit=96468992 storageDiskForceBytes=16777216 Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD http://perf.jenkins.couchbase.com/job/oceanus-dev/176/ Avg. ingestion rate (items/sec): 179,101
            murtadha.hubail Murtadha Hubail added a comment - - edited

            Hi Bo-Chun Wang,

            I reverted the default storageDiskForceBytes back to 16MB in build 7.1.0-2333. This regression should be gone now.

             

            murtadha.hubail Murtadha Hubail added a comment - - edited Hi Bo-Chun Wang , I reverted the default storageDiskForceBytes back to 16MB in build 7.1.0-2333. This regression should be gone now.  

            The performance is back in build 7.1.0-2333. I close this ticket.

            Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD

             

            Rebalance-in (min), 3 -> 4 nodes, BigFUN 20M users (320M docs), HDD

            bo-chun.wang Bo-Chun Wang added a comment - The performance is back in build 7.1.0-2333. I close this ticket. Avg. ingestion rate (items/sec), 4 nodes, BigFUN 20M users (320M docs), 3 indexes, HDD Build Ingestion Rate Job 7.1.0-2179 411,228 http://perf.jenkins.couchbase.com/job/oceanus/7974/ 7.1.0-2226 229,232 http://perf.jenkins.couchbase.com/job/oceanus/8009/ 7.1.0-2333 403,049 http://perf.jenkins.couchbase.com/job/oceanus/8107/   Rebalance-in (min), 3 -> 4 nodes, BigFUN 20M users (320M docs), HDD Build Ingestion Rate Job 7.1.0-2223 9.4 http://perf.jenkins.couchbase.com/job/oceanus/8048/ 7.1.0-2226 33.2 http://perf.jenkins.couchbase.com/job/oceanus/8104/ 7.1.0-2333 9.7 http://perf.jenkins.couchbase.com/job/oceanus/8108/

            People

              bo-chun.wang Bo-Chun Wang
              bo-chun.wang Bo-Chun Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty