Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7523

rebalance performance regression in 2.0.1 vs 2.0.0 (apparently only under load)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: None
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:
      linux

      Description

      Recent benchmarks: http://dashboard.hq.couchbase.com/litmus/dashboard/

      Summary:

      Rebalance-in ( 2-4 nodes), 7M items, mixed workload:

      ec2: 2.0.0-1976 (RTM) took 1687 sec while 2.0.1-116 took 3581 sec
      thor (data center physical machines): 2.0.0-1976 (RTM) took 901 sec while 2.0.1-123 took 1553 sec

      Rebalance-out (4-2 nodes), 7M items, mixed workload:

      ec2: 2.0.0-1976 (RTM) took 2854 sec while 2.0.1-116 took 5075 sec
      thor (data center physical machines): 2.0.0-1976 (RTM) took 1142 sec while 2.0.1-123 took 2773 sec

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        ronnie Ronnie Sun (Inactive) added a comment -

        diags:

        thor (data center machines), build 2.0.1-123-rel:

        reb-1: http://172.23.96.10:8080/job/thor-parent/112/

        reb-1-out: http://172.23.96.10:8080/job/thor-parent/110/

        Show
        ronnie Ronnie Sun (Inactive) added a comment - diags: thor (data center machines), build 2.0.1-123-rel: reb-1: http://172.23.96.10:8080/job/thor-parent/112/ reb-1-out: http://172.23.96.10:8080/job/thor-parent/110/
        Show
        pavelpaulau Pavel Paulau added a comment - + summary of test view views: https://docs.google.com/spreadsheet/ccc?key=0AgLUessE73UXdDV1SXhUZjJ0b0RhU3gtdlUzZGloUFE#gid=0
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Folks, I still see no results from 2.0.0 with +A. May I insist on having some ?

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Folks, I still see no results from 2.0.0 with +A. May I insist on having some ?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Alternatively we can try 2.0.1 without +A

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Alternatively we can try 2.0.1 without +A
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Did some runs comparing 2.0.0 and latest branch-2.0.1 and I've found 2.0.1 to be faster.

        All my data fits in page cache plus I didn't send any mutations during rebalance. Plus I've allowed all nodes to use just one (same) CPU core. I'll rerun with binding different node to different cores.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Did some runs comparing 2.0.0 and latest branch-2.0.1 and I've found 2.0.1 to be faster . All my data fits in page cache plus I didn't send any mutations during rebalance. Plus I've allowed all nodes to use just one (same) CPU core. I'll rerun with binding different node to different cores.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        assigning this back to Ronnie as Alk is expecting results for 2.0.0 with +A

        Show
        farshid Farshid Ghods (Inactive) added a comment - assigning this back to Ronnie as Alk is expecting results for 2.0.0 with +A
        Hide
        pavelpaulau Pavel Paulau added a comment -

        Hi Aleksey,

        I added first results for reb-out with +A, it shows >1.5x regression (see link above). I'm gathering more results but apparently it takes time.

        Anyway, I believe KV results with +A will be even more helpful.

        Show
        pavelpaulau Pavel Paulau added a comment - Hi Aleksey, I added first results for reb-out with +A, it shows >1.5x regression (see link above). I'm gathering more results but apparently it takes time. Anyway, I believe KV results with +A will be even more helpful.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        No. Let's have separate bug for rebalance with views.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - No. Let's have separate bug for rebalance with views.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Given that I've tried rebalance myself and saw no regression (in fact speedup) in 2.0.1 I think we can start assuming that problem only occurs if rebalance is performed under load.

        BTW, I've also confirmed personally that erlang core performance was not regressed in 2.0.1 (i.e. we have added -fno-strict-aliasing to CFLAGS).

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Given that I've tried rebalance myself and saw no regression (in fact speedup) in 2.0.1 I think we can start assuming that problem only occurs if rebalance is performed under load. BTW, I've also confirmed personally that erlang core performance was not regressed in 2.0.1 (i.e. we have added -fno-strict-aliasing to CFLAGS).
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Fix is merged as part of chain ending at: http://review.couchbase.org/24067

        I've found that my original plan to try to end serial phase of vbucket move at end of backfill helps a lot.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Fix is merged as part of chain ending at: http://review.couchbase.org/24067 I've found that my original plan to try to end serial phase of vbucket move at end of backfill helps a lot.

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            ronnie Ronnie Sun (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes