Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-20136

Perf daily: rebalance in 10 buckets regression form 4.7.0-835 to 4.7.0-857

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • Unknown

    Description

      As part of daily sanity the rebalance in with 10 empty buckets, the time increased from 4.5 minutes to 5.3 minute, between builds 835 and 857. This is an increase of 48 seconds over 4.5 minutes or about 15%. This is readily reproducible.

      The node 10.5.3.44 is the one being rebalanced in.
      Logs from both runs are attached, please let me know if there is more information I can provide.

      Attachments

        1. changelog.txt
          34 kB
          Raju Suravarjjala
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Jim, the procedure you use is similar to what you did for MB-20482, though the command is:

          python -u perfSanity/scripts/perf_regression_runner_alpha.py -e -v 4.7.0-837  -r 2016-07-14:13:18 -q "testName='reb_in_10_buckets'" -n -e
          

          Please let me know if you need more information on this.

          ericcooper Eric Cooper (Inactive) added a comment - Jim, the procedure you use is similar to what you did for MB-20482 , though the command is: python -u perfSanity/scripts/perf_regression_runner_alpha.py -e -v 4.7.0-837 -r 2016-07-14:13:18 -q "testName='reb_in_10_buckets'" -n -e Please let me know if you need more information on this.
          jwalker Jim Walker added a comment -

          Eric Cooper what spec system is used for this test, is it still the 4 core system?

          jwalker Jim Walker added a comment - Eric Cooper what spec system is used for this test, is it still the 4 core system?
          ericcooper Eric Cooper (Inactive) added a comment - - edited

          Yes - the same 4 core system per previous Jiras.

          ericcooper Eric Cooper (Inactive) added a comment - - edited Yes - the same 4 core system per previous Jiras.
          jwalker Jim Walker added a comment -

          Eric Cooper so I can reproduce the test (or something similar).

          But is a smaller 'value' better? The unit reported by the test is I presume the wall-clock time for the rebalance to complete?

          So I'm just trying to get a feel for what the test really does and the load/operations it places on the cluster. I'm limited to running VMs on my Macbook and found that for some reason the test hung if I tried the default (just seemed to be doing nothing). However the test works with 5 buckets, so I've stuck with that for now.

          With 4.5.1 I don't see a pronounced regression, comparing 4.5.1 2801 vs 2802 doing 2 runs of each I got the following values.

          *4.5.1-2801 : 3.96, 3.93
          *4.5.1-2802 : 3.93, 4.00

          Not a strong regression, maybe that 4.00 is a trend towards 2802 being slower.

          On 4.7 I see an improvement, and as you observed only moxi went away?

          • 4.7-837 : 3.97, 3.93
          • 4.7-838 : 3.01, 3.01

          That is 4.7-838 is faster? However you've seen that it is slower? If this is faster, my hypothesis is that the removal of moxi may have freed some resources on these "small" systems which are overloaded by the many bucket config.

          Overall though, what is this defect tracking? The value change triggered by moxi (smaller is better???) or 4.5.1, as the comments are really leading to two different issues and should perhaps become two different MBs.

          So to summarise my questions for now:

          1. What is the value reported by this test?
          2. A smaller value better? (larger_is_better = false is set in the test spec)
          3. What are all the pairs of builds where a change is seen? I.e. 4.7-837 to 4.7-838, 4.5.1-x to 4.5.1-y, is there another pair of 4.7 builds where a regression appears?
          jwalker Jim Walker added a comment - Eric Cooper so I can reproduce the test (or something similar). But is a smaller 'value' better? The unit reported by the test is I presume the wall-clock time for the rebalance to complete? So I'm just trying to get a feel for what the test really does and the load/operations it places on the cluster. I'm limited to running VMs on my Macbook and found that for some reason the test hung if I tried the default (just seemed to be doing nothing). However the test works with 5 buckets, so I've stuck with that for now. With 4.5.1 I don't see a pronounced regression, comparing 4.5.1 2801 vs 2802 doing 2 runs of each I got the following values. *4.5.1-2801 : 3.96, 3.93 *4.5.1-2802 : 3.93, 4.00 Not a strong regression, maybe that 4.00 is a trend towards 2802 being slower. On 4.7 I see an improvement, and as you observed only moxi went away? 4.7-837 : 3.97, 3.93 4.7-838 : 3.01, 3.01 That is 4.7-838 is faster? However you've seen that it is slower? If this is faster, my hypothesis is that the removal of moxi may have freed some resources on these "small" systems which are overloaded by the many bucket config. Overall though, what is this defect tracking? The value change triggered by moxi (smaller is better???) or 4.5.1, as the comments are really leading to two different issues and should perhaps become two different MBs. So to summarise my questions for now: What is the value reported by this test? A smaller value better? (larger_is_better = false is set in the test spec) What are all the pairs of builds where a change is seen? I.e. 4.7-837 to 4.7-838, 4.5.1-x to 4.5.1-y, is there another pair of 4.7 builds where a regression appears?

          Bulk closing all invalid, duplicate, user error and won't fix issues

          raju Raju Suravarjjala added a comment - Bulk closing all invalid, duplicate, user error and won't fix issues

          People

            ericcooper Eric Cooper (Inactive)
            ericcooper Eric Cooper (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty