Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32388

3% throughput regression in KV after MB-32061 change

    XMLWordPrintable

Details

    • Untriaged
    • Yes

    Description

      the MB-32061 fix is the following change:

      Build couchbase-server-6.5.0-1691 contains kv_engine commit 54d4e73 with commit message: MB-32061: Lower spdlog buffer size

      which resulted about 3% drop in max throughput (pillowfight test)

      cbc-pillowfight, 2 nodes, 50/50 R/W:

      2.55M ops/sec vs 2.50M ops/sec average on build 6.5.0-1691

      logs and reports:

      6.5.0-1691: http://perf.jenkins.couchbase.com/job/ares/9090/console

      6.5.0-1690: http://perf.jenkins.couchbase.com/job/ares/9091/console

      cpu profiles attached

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            drigby Dave Rigby added a comment -

            Assigning to Ben as he made that change. Can you take an initial look this morning, then we can discuss it this afternoon?

            drigby Dave Rigby added a comment - Assigning to Ben as he made that change. Can you take an initial look this morning, then we can discuss it this afternoon?

            Quick first look - pulled the log files for each node and none have been filled (~30 or so items per second) to the point where I'd expect that we're filling up the buffer (8196 items) and blocking (the queue is implemented as a blocking queue). Not sure what's going on here, if the buffer isn't getting full then a perf regression doesn't seem to make much sense. Will try to recreate on mancouch if possible, and add some small changes to spdlog to print if we ever block due to the buffer being full.

            Found the tweakme.h while digging into this; we could make a few minor changes to hopefully improve performance if required.

            ben.huddleston Ben Huddleston added a comment - Quick first look - pulled the log files for each node and none have been filled (~30 or so items per second) to the point where I'd expect that we're filling up the buffer (8196 items) and blocking (the queue is implemented as a blocking queue). Not sure what's going on here, if the buffer isn't getting full then a perf regression doesn't seem to make much sense. Will try to recreate on mancouch if possible, and add some small changes to spdlog to print if we ever block due to the buffer being full. Found the tweakme.h while digging into this; we could make a few minor changes to hopefully improve performance if required.
            ben.huddleston Ben Huddleston added a comment - - edited

            I re-ran the two tests 8 times and the supposed regression would be 3k ops/s, possibly just due to variation between any given run.

            1690 ops/s 1691 ops/s
            2558629 2504171
            2555644 2543146
            2582902 2538969
            2475998 2548453
            2513313 2517831
            2555190 2556105
            2544343 2550765
            2537839 2541184
            Averages =======
            2,540,482.25 2,537,578

             Investigation

            I pulled the log files for these test runs given in the description to do some investigating, wanted to know how much we were logging.

            From build 1690.

            EMEAUKMAC01:1690 benhuddleston$ find . -name 'memcached*' | xargs wc -l
                 193 ./cbcollect_info_ns_1@172.23.133.11_20181217-195405/memcached.log
                 197 ./cbcollect_info_ns_1@172.23.133.12_20181217-195405/memcached.log
               39836 ./cbcollect_info_ns_1@172.23.133.14_20181217-195404/memcached.log
               39516 ./cbcollect_info_ns_1@172.23.133.13_20181217-195404/memcached.log
               79742 total
            EMEAUKMAC01:1690 benhuddleston$ 

            From build 1691.

            EMEAUKMAC01:1691 benhuddleston$ find . -name 'memcached*' | xargs wc -l
               37881 ./cbcollect_info_ns_1@172.23.133.14_20181217-192549/memcached.log
               37830 ./cbcollect_info_ns_1@172.23.133.13_20181217-192549/memcached.log
                 209 ./cbcollect_info_ns_1@172.23.133.12_20181217-192549/memcached.log
                 203 ./cbcollect_info_ns_1@172.23.133.11_20181217-192549/memcached.log
               76123 total

            So we actually logged less in build 1691. Seems a bit odd given the test took longer but we log a lot during compaction and from the report graphs it looks like we have started an extra compaction on run 1690. Given the changes made between builds have nothing to do with compaction I won't worry about that.

            From looking at the changes above, the only way I can see performance being slower is if the buffer fills up at some point and we block. Given the amount of messages in nodes 3 and 4, I took a look at the distribution of log messages over the course of the test. Using the command

            awk '{print substr($0,0,19)}' memcached.log | uniq -c | sort -k 2 

             I can see how many messages we try to log every second. This timestamp is written into the log message when we sink a message, so when the async logging thread is writing it to one of it's outputs. Numbers of log messages sunk every second spiked early when we set up DCP connections (~4000 in 1 second) before starting the test, then dropped to an average of (~20-30 per second) on both nodes.

            The regression still didn't make much sense to me so I thought about why the buffer may be full now when it wasn't previously (prior to the upgrade from version 0.14.0 of spdlog to version 1.1.0). The async logger in spdlog 0.14.0 uses a fixed length buffer of 2MB , and async log messages containing a std::string end up being written into a queue that uses this 2MB buffer. The async logger in spdlog 1.1.0 creates a buffer for every message of 176 chars (176 bytes + a little extra for management). Anything longer gets allocated somewhere else by fmtlib but that's not a concern here. These messages are then written into a queue that is basically a vector of (as of build 1691) 8192 async log messages. Okay, so if our messages are smaller than 176 bytes then it looks like we'll be wasting a bit of space. But how long are our messages? We can print a rolling average of the length of each line in the log using this command

            awk '{total += length; num++; print total/num}' memcached.log

            The average for nodes 3 and 4 in both builds was 161-162, whilst the average for nodes 1 and 2 was 91-93. When I first checked these averages, I only checked for nodes 1 and 2, so with an average of about 90 characters per log message, we'd be effectively halving the size of our buffer. I created a build that doubles the size of the buffer and tested this.

            1690 ops/s 1691 ops/s Doubled Buffer ops/s
            2558629 2504171 2522314
            2555644 2543146 2553333
            2582902 2538969 2554278
            2475998 2548453 2524549
            2513313 2517831 2658118
            2555190 2556105 2542584
            2544343 2550765 2561333
            2537839 2541184 2520652
            Averages ======= =======
            2,540,482.25 2,537,578 2,554,645.13

            One of the results of 2,658,118 looks like a potential outlier and when I remove it I get an average of 2,539,863.29. So, probably not worth the change although I could do some more runs to verify. 

            One thing to note is that these builds include the "CouchKVStore::readCollectionsManifest: doc not found" message that makes up roughly ~20-25% of the logs on nodes 3 and 4. If we remove it, the average length of our log messages from these nodes is 177-179 characters. This may be a concern as we do extra allocations for log messages over 176 characters in length. Out of interest, I grabbed some details about the lengths of log messages using the command 

            awk '{print length}' memcached.log | sort | uniq -c | sort -k 2 -n >> logLengthDistribution.txt 

            Column 1 is number of occurences, column 2 is length of line. logLengthDistribution.txt

            ben.huddleston Ben Huddleston added a comment - - edited I re-ran the two tests 8 times and the supposed regression would be 3k ops/s, possibly just due to variation between any given run. 1690 ops/s 1691 ops/s 2558629 2504171 2555644 2543146 2582902 2538969 2475998 2548453 2513313 2517831 2555190 2556105 2544343 2550765 2537839 2541184 Averages ======= 2,540,482.25 2,537,578   Investigation I pulled the log files for these test runs given in the description to do some investigating, wanted to know how much we were logging. From build 1690. EMEAUKMAC01: 1690 benhuddleston$ find . -name 'memcached*' | xargs wc -l 193 ./cbcollect_info_ns_1 @172 .23. 133 .11_20181217- 195405 /memcached.log 197 ./cbcollect_info_ns_1 @172 .23. 133 .12_20181217- 195405 /memcached.log 39836 ./cbcollect_info_ns_1 @172 .23. 133 .14_20181217- 195404 /memcached.log 39516 ./cbcollect_info_ns_1 @172 .23. 133 .13_20181217- 195404 /memcached.log 79742 total EMEAUKMAC01: 1690 benhuddleston$ From build 1691. EMEAUKMAC01: 1691 benhuddleston$ find . -name 'memcached*' | xargs wc -l 37881 ./cbcollect_info_ns_1 @172 .23. 133 .14_20181217- 192549 /memcached.log 37830 ./cbcollect_info_ns_1 @172 .23. 133 .13_20181217- 192549 /memcached.log 209 ./cbcollect_info_ns_1 @172 .23. 133 .12_20181217- 192549 /memcached.log 203 ./cbcollect_info_ns_1 @172 .23. 133 .11_20181217- 192549 /memcached.log 76123 total So we actually logged less in build 1691. Seems a bit odd given the test took longer but we log a lot during compaction and from the report graphs it looks like we have started an extra compaction on run 1690. Given the changes made between builds have nothing to do with compaction I won't worry about that. From looking at the changes above, the only way I can see performance being slower is if the buffer fills up at some point and we block. Given the amount of messages in nodes 3 and 4, I took a look at the distribution of log messages over the course of the test. Using the command awk '{print substr($0,0,19)}' memcached.log | uniq -c | sort -k 2  I can see how many messages we try to log every second. This timestamp is written into the log message when we sink a message, so when the async logging thread is writing it to one of it's outputs. Numbers of log messages sunk every second spiked early when we set up DCP connections (~4000 in 1 second) before starting the test, then dropped to an average of (~20-30 per second) on both nodes. The regression still didn't make much sense to me so I thought about why the buffer may be full now when it wasn't previously (prior to the upgrade from version 0.14.0 of spdlog to version 1.1.0). The async logger in spdlog 0.14.0 uses a fixed length buffer of 2MB , and async log messages containing a std::string end up being written into a queue that uses this 2MB buffer. The async logger in spdlog 1.1.0 creates a buffer for every message of 176 chars (176 bytes + a little extra for management). Anything longer gets allocated somewhere else by fmtlib but that's not a concern here. These messages are then written into a queue that is basically a vector of (as of build 1691) 8192 async log messages. Okay, so if our messages are smaller than 176 bytes then it looks like we'll be wasting a bit of space. But how long are our messages? We can print a rolling average of the length of each line in the log using this command awk '{total += length; num++; print total/num}' memcached.log The average for nodes 3 and 4 in both builds was 161-162, whilst the average for nodes 1 and 2 was 91-93. When I first checked these averages, I only checked for nodes 1 and 2, so with an average of about 90 characters per log message, we'd be effectively halving the size of our buffer. I created a build that doubles the size of the buffer and tested this. 1690 ops/s 1691 ops/s Doubled Buffer ops/s 2558629 2504171 2522314 2555644 2543146 2553333 2582902 2538969 2554278 2475998 2548453 2524549 2513313 2517831 2658118 2555190 2556105 2542584 2544343 2550765 2561333 2537839 2541184 2520652 Averages ======= ======= 2,540,482.25 2,537,578 2,554,645.13 One of the results of 2,658,118 looks like a potential outlier and when I remove it I get an average of 2,539,863.29. So, probably not worth the change although I could do some more runs to verify.  One thing to note is that these builds include the "CouchKVStore::readCollectionsManifest: doc not found" message that makes up roughly ~20-25% of the logs on nodes 3 and 4. If we remove it, the average length of our log messages from these nodes is 177-179 characters. This may be a concern as we do extra allocations for log messages over 176 characters in length. Out of interest, I grabbed some details about the lengths of log messages using the command  awk '{print length}' memcached.log | sort | uniq -c | sort -k 2 -n >> logLengthDistribution.txt Column 1 is number of occurences, column 2 is length of line.  logLengthDistribution.txt

            Alex Gyryk Given the averages in the table in the comment above, I don't believe this commit is actually causing a performance regression.

            ben.huddleston Ben Huddleston added a comment - Alex Gyryk Given the averages in the table in the comment above, I don't believe this commit is actually causing a performance regression.
            drigby Dave Rigby added a comment -

            Based on Ben's latest comments I believe there's no significant regression as part of this patch. Assigning back to perf to confirm.

            drigby Dave Rigby added a comment - Based on Ben's latest comments I believe there's no significant regression as part of this patch. Assigning back to perf to confirm.

            People

              oleksandr.gyryk Alex Gyryk (Inactive)
              oleksandr.gyryk Alex Gyryk (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty