Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6796

perf - 10x slower rebalance performance with consistent view turned on

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0-beta-2
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
    • Environment:
      centos

      Description

      For build 1782, rebalance took ~6800 seconds (8M workload , 2 -> 4 nodes with consistent view turned on by default).

      By contrast, build 1723, rebalance took ~500 seconds for the same workload, without consistent views.

      Please refer to attached screenshot.

      1. diag0.bz2
        1.98 MB
        Aleksey Kondratenko
      2. diag3.bz2
        1.72 MB
        Aleksey Kondratenko
      3. reb-1-2.loop_2.0.0-1782-rel-enterprise_2.0.0-1782-rel-enterprise_terra_Oct-01-2012_18-57-11.pdf
        2.26 MB
        Ronnie Sun
      1. reb-1782-cons-views.png
        176 kB
      2. Screen Shot 2012-10-02 at 2.33.34 PM.png
        133 kB
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        I was able to speed up commit times 2 times by doing reserving space with linux's fallocate (with FL_KEEP_SIZE otherwise our constant re-opening will cause huge slowness finding last usable header). We can remove more of metadata syncing overhead by pre-extending with actual 0 bytes. That way our fdatasync will not have to update file metadata at all, but as I pointed out above today this will hit problems with re-opening of .couch files and with finding last valid header.

        Anyway, we clearly need order of magnitude improvement rather than 2x. So we need something else.

        Dirty patch can be found here: http://paste.debian.net/198227/

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - I was able to speed up commit times 2 times by doing reserving space with linux's fallocate (with FL_KEEP_SIZE otherwise our constant re-opening will cause huge slowness finding last usable header). We can remove more of metadata syncing overhead by pre-extending with actual 0 bytes. That way our fdatasync will not have to update file metadata at all, but as I pointed out above today this will hit problems with re-opening of .couch files and with finding last valid header. Anyway, we clearly need order of magnitude improvement rather than 2x. So we need something else. Dirty patch can be found here: http://paste.debian.net/198227/
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        I think we have enough data that proves that indeed fsync is the problem here. I think we should have Damien and Peter involved to seek a solution.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - I think we have enough data that proves that indeed fsync is the problem here. I think we should have Damien and Peter involved to seek a solution.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        There's another thing I don't understand. I'm doing around 130 sets (alll of them updates) per second. Stats show >200 disk updates per second (which is ok consider replica writes too).

        Even more weird I'm seeing 4.5 megs of writes per second in iotop. Even for 300 disk update per second that's 15k of bytes in disk writes per item. My items are small btw, less then 100 bytes. Kinda big imho. I.e. 4 disk blocks per item.

        We know however that overhead will be more or less stable for higher write rates.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - There's another thing I don't understand. I'm doing around 130 sets (alll of them updates) per second. Stats show >200 disk updates per second (which is ok consider replica writes too). Even more weird I'm seeing 4.5 megs of writes per second in iotop. Even for 300 disk update per second that's 15k of bytes in disk writes per item. My items are small btw, less then 100 bytes. Kinda big imho. I.e. 4 disk blocks per item. We know however that overhead will be more or less stable for higher write rates.
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ep-engine-2-0 #441 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/441/)
        MB-6796 Prioritize flushing pending vbuckets over regular vbuckets (Revision e9ce877041d101efee775e9c65ea6b9eef914926)

        Result = SUCCESS
        Chiyoung Seo :
        Files :

        • src/stats.hh
        • src/ep.cc
        • src/flusher.hh
        • src/vbucketmap.cc
        • src/flusher.cc
        • src/vbucket.hh
        • tests/ep_test_apis.cc
        • tests/ep_testsuite.cc
        • tests/ep_test_apis.h
        • docs/stats.org
        • src/vbucketmap.hh
        • include/ep-engine/command_ids.h
        • src/ep_engine.cc
        • src/ep.hh
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ep-engine-2-0 #441 (See http://qa.hq.northscale.net/job/github-ep-engine-2-0/441/ ) MB-6796 Prioritize flushing pending vbuckets over regular vbuckets (Revision e9ce877041d101efee775e9c65ea6b9eef914926) Result = SUCCESS Chiyoung Seo : Files : src/stats.hh src/ep.cc src/flusher.hh src/vbucketmap.cc src/flusher.cc src/vbucket.hh tests/ep_test_apis.cc tests/ep_testsuite.cc tests/ep_test_apis.h docs/stats.org src/vbucketmap.hh include/ep-engine/command_ids.h src/ep_engine.cc src/ep.hh
        Hide
        kzeller kzeller added a comment -

        Added to RN as: Prioritize flushing pending vbuckets
        over regular vbuckets.
        This is a performance improvement used
        for rebalancing buckets that have no
        views or design docs when consistent view mode is enabled.

        Show
        kzeller kzeller added a comment - Added to RN as: Prioritize flushing pending vbuckets over regular vbuckets. This is a performance improvement used for rebalancing buckets that have no views or design docs when consistent view mode is enabled.

          People

          • Assignee:
            chiyoung Chiyoung Seo
            Reporter:
            ronnie Ronnie Sun (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes