Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19821

KV Throughput is slower in spock compared to watson.

    XMLWordPrintable

Details

    • Untriaged
    • Yes

    Description

      Pillowfight Read Heavy (80/20 R/W) test is slower in spock. 50/50 R/W test is ok.

      Test/Build 4.1.1-5914 4.5.0-2594 4.5.0-2595 4.5.0-2600 4.5.0-2601 4.7.0-734 4.7.0-779
      Pillowfight (80/20 R/W) test (ops/sec) 815K 832K 573K 568K *832K 569K 833K

      #

      Test/Build 4.1.1-5914 4.5.0-2594 4.5.0-2600 4.7.0-734
      Pillowfight (50/50 R/W) test (ops/sec) 389K 390K 389K 390K

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            wayne Wayne Siu added a comment -

            Verified with 4.5.0-2601.

            {
            'snapshots': ['leto_ssd_450-2601-enterprise_28b_access'],
            'metric': 'kv_max_ops_10M_reads_avg_ops_leto_ssd', 
            'build_url': 'http://perf.jenkins.couchbase.com/job/leto/2793/',
            'build': u'4.5.0-2601-enterprise',
            'value': 832924.6
            }
            

            wayne Wayne Siu added a comment - Verified with 4.5.0-2601. { 'snapshots': ['leto_ssd_450-2601-enterprise_28b_access'], 'metric': 'kv_max_ops_10M_reads_avg_ops_leto_ssd', 'build_url': 'http://perf.jenkins.couchbase.com/job/leto/2793/', 'build': u'4.5.0-2601-enterprise', 'value': 832924.6 }
            wayne Wayne Siu added a comment -

            Will close the ticket when the same is verified in spock branch.

            wayne Wayne Siu added a comment - Will close the ticket when the same is verified in spock branch.

            I am trying to repro in the daily sanity. With a standalone pillow fight command I see a 5-10% regression. When running with perfrunner I have tried:

            • more or less threads - 10 to 100
            • 1 to 10M items
            • change the write percentage to 5%
            • compaction threshold 30 to 100%
            • 1 or 2 nodes
            • the bucket is 1G and is about 10% of capacity
            • generally I see CPU in the 90% range
            • ops/sec in the the range of 200K to 800K
            • 1 versus 2 node cluster

            After all this experimentation I think (and I need to verify this) that a 1 node cluster is more like to uncover the regression than a 2 node cluster. Based on knowledge of the fix does this make sense? Of the above are there settings I should (or should not) change to help expose this?

            ericcooper Eric Cooper (Inactive) added a comment - I am trying to repro in the daily sanity. With a standalone pillow fight command I see a 5-10% regression. When running with perfrunner I have tried: more or less threads - 10 to 100 1 to 10M items change the write percentage to 5% compaction threshold 30 to 100% 1 or 2 nodes the bucket is 1G and is about 10% of capacity generally I see CPU in the 90% range ops/sec in the the range of 200K to 800K 1 versus 2 node cluster After all this experimentation I think (and I need to verify this) that a 1 node cluster is more like to uncover the regression than a 2 node cluster. Based on knowledge of the fix does this make sense? Of the above are there settings I should (or should not) change to help expose this?
            drigby Dave Rigby added a comment -

            I am trying to repro in the daily sanity. With a standalone pillow fight command I see a 5-10% regression.

            ... <cut> ...

            After all this experimentation I think (and I need to verify this) that a 1 node cluster is more like to uncover the regression than a 2 node cluster. Based on knowledge of the fix does this make sense? Of the above are there settings I should (or should not) change to help expose this?

            Your results approximately match mine - the figures I quoted in my local test (~23.5s -> 24.7s, or ~5%) were on a single Ubuntu 12.04 machine (24 logical CPU Sandybridge Xeon), running two nodes (via cluster_run). Interestingly I saw much more significant difference running the same pillowfight test on my OS X laptop (Haswell, 8 logical CPU) - where performance dropped by over 50%.

            The underlying cause of the perf regression was lock contention on a per-bucket mutex, so having lots of different connections trying to concurrently access the same bucket would be expected to show the issue. I'd probably expect you'd see it sooner with smaller documents, lots of connections at the same time and mostly performing reads.

            drigby Dave Rigby added a comment - I am trying to repro in the daily sanity. With a standalone pillow fight command I see a 5-10% regression. ... <cut> ... After all this experimentation I think (and I need to verify this) that a 1 node cluster is more like to uncover the regression than a 2 node cluster. Based on knowledge of the fix does this make sense? Of the above are there settings I should (or should not) change to help expose this? Your results approximately match mine - the figures I quoted in my local test (~23.5s -> 24.7s, or ~5%) were on a single Ubuntu 12.04 machine (24 logical CPU Sandybridge Xeon), running two nodes (via cluster_run). Interestingly I saw much more significant difference running the same pillowfight test on my OS X laptop (Haswell, 8 logical CPU) - where performance dropped by over 50%. The underlying cause of the perf regression was lock contention on a per-bucket mutex, so having lots of different connections trying to concurrently access the same bucket would be expected to show the issue. I'd probably expect you'd see it sooner with smaller documents, lots of connections at the same time and mostly performing reads.
            wayne Wayne Siu added a comment -

            Also verified in 4.7.0-779.
            The regression is fixed.

            wayne Wayne Siu added a comment - Also verified in 4.7.0-779. The regression is fixed.

            People

              wayne Wayne Siu
              wayne Wayne Siu
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty