Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6232

ep-engine needs 1.5 minutes to create 1k vbuckets. Seems too slow (faster with barrier=0)

    Details

    • Triage:
      Triaged
    • Is this a Regression?:
      Yes

      Description

      SUBJ.

      Just try single node dev cluster and try to load beers sample (it'll fail for perhaps different reason).

      You can observe in logs:

      We're doing set_vbucket_state requests from janitor (ns_memcached messages) and this is the first messages from mccouch about just created vbucket (capi_set_view_manager messages and mc_connection messages). Note: I've added extra log messages at the beginning of sync_notify and end to see if it's couchdb's or ns_servers' fault. From timestamps it's clear that it's not.

      [ns_server:debug,2012-08-15T9:47:40.309,n_0@127.0.0.1:<0.868.0>:mc_connection:do_notify_vbucket_update:106]sending out sync_notify
      [ns_server:info,2012-08-15T9:47:40.309,n_0@127.0.0.1:<0.870.0>:ns_memcached:do_handle_call:485]Changed vbucket 740 state to active
      [views:debug,2012-08-15T9:47:40.309,n_0@127.0.0.1:mc_couch_events:capi_set_view_manager:handle_mc_couch_event:418]Got set_vbucket event for beer-sample/1023. Updated state: active
      [ns_server:info,2012-08-15T9:47:40.309,n_0@127.0.0.1:<0.870.0>:ns_memcached:do_handle_call:485]Changed vbucket 739 state to active
      [ns_server:debug,2012-08-15T9:47:40.309,n_0@127.0.0.1:<0.868.0>:mc_connection:do_notify_vbucket_update:113]done
      [ns_server:info,2012-08-15T9:47:40.310,n_0@127.0.0.1:<0.870.0>:ns_memcached:do_handle_call:485]Changed vbucket 738 state to active
      [ns_server:info,2012-08-15T9:47:40.310,n_0@127.0.0.1:<0.870.0>:ns_memcached:do_handle_call:485]Changed vbucket 737 state to active
      

      And last message is at:

      [ns_server:debug,2012-08-15T9:49:14.410,n_0@127.0.0.1:<0.868.0>:mc_connection:do_notify_vbucket_update:106]sending out sync_notify
      [views:debug,2012-08-15T9:49:14.410,n_0@127.0.0.1:mc_couch_events:capi_set_view_manager:handle_mc_couch_event:418]Got set_vbucket event for beer-sample/1. Updated state: active
      [ns_server:debug,2012-08-15T9:49:14.410,n_0@127.0.0.1:<0.868.0>:mc_connection:do_notify_vbucket_update:113]done
      [ns_server:debug,2012-08-15T9:49:14.501,n_0@127.0.0.1:<0.868.0>:mc_connection:do_notify_vbucket_update:106]sending out sync_notify
      [views:debug,2012-08-15T9:49:14.502,n_0@127.0.0.1:mc_couch_events:capi_set_view_manager:handle_mc_couch_event:418]Got set_vbucket event for beer-sample/0. Updated state: active
      [ns_server:debug,2012-08-15T9:49:14.502,n_0@127.0.0.1:<0.868.0>:mc_connection:do_notify_vbucket_update:113]done
      

      I've just retried with remounting with barrier=0 and it was blazing fast. What this mount option changes is short fsync are nearly instant because they merely send data to disk's buffer without waiting until data actually hits platter.

      So that's evidence of some excessive use of fsyncs somewhere during vbucket db files creation and that's clearly on ep-engine/couchstore side.

      Set this to blocker as it'll affect people trying to create buckets on real hardware, hard disks and any modern linux distro (where barriers are on by default).

      1. debug.1.gz
        269 kB
        Tommie McAfee

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          chiyoung Chiyoung Seo added a comment -

          I don't think this can be addressed by a sherlock release. When any long-lasting issue needs to be moved to the next major release, please communicate and discuss it with the corresponding component engineers.

          Show
          chiyoung Chiyoung Seo added a comment - I don't think this can be addressed by a sherlock release. When any long-lasting issue needs to be moved to the next major release, please communicate and discuss it with the corresponding component engineers.
          Hide
          drigby Dave Rigby added a comment -

          Sundar Sridharan Could you add a comment describing what the planned solution to this issue was? You can then assign the MB to me - thanks.

          Show
          drigby Dave Rigby added a comment - Sundar Sridharan Could you add a comment describing what the planned solution to this issue was? You can then assign the MB to me - thanks.
          Hide
          sundar Sundar Sridharan added a comment -

          This issue I believe was seen when a large number of vbucket files were getting instantiated during bucket creation. Each vbucket file instantiation required several fsync calls to 1. set vbucket state 2. set local couchstore doc to persist the doc count. The repeated fsync issued ended up serializing all the calls and on some slow spinning disks, this ended up taking longer than 1.5 minutes. As noted earlier on the ticket, it was hardly seen on SSD or disks with RAID 10.
          The reason it was assigned to me was that after forestdb integration the idea was to reduce the number of vbucket files that exist in the system by grouping several vbuckets into single shard files so that this creation stall can be avoided.

          Show
          sundar Sundar Sridharan added a comment - This issue I believe was seen when a large number of vbucket files were getting instantiated during bucket creation. Each vbucket file instantiation required several fsync calls to 1. set vbucket state 2. set local couchstore doc to persist the doc count. The repeated fsync issued ended up serializing all the calls and on some slow spinning disks, this ended up taking longer than 1.5 minutes. As noted earlier on the ticket, it was hardly seen on SSD or disks with RAID 10. The reason it was assigned to me was that after forestdb integration the idea was to reduce the number of vbucket files that exist in the system by grouping several vbuckets into single shard files so that this creation stall can be avoided.
          Hide
          drigby Dave Rigby added a comment -

          Sundar Sridharan Perfect - thanks for the background info.

          Show
          drigby Dave Rigby added a comment - Sundar Sridharan Perfect - thanks for the background info.
          Hide
          ingenthr Matt Ingenthron added a comment -

          +1 on coalescing to less files. The mapping of vbuckets to files has always felt a bit wrong to me as you lose a lot of the benefits of AOFs when you have lots of AOFs.

          Regarding the fsync() however, I would still argue that there may be another fix. The use case this was raised for, flush_all from REST, or the other case, a new bucket both have a situation where the bucket isn't writeable at all until completion. If the logic were refactored from:
          block frontend traffic
          for each vbucket to create

          { create vbucket fsync(vbucket's fd) }

          unblock frontend traffic

          To:
          block frontend traffic
          for each vbucket to create

          { create vbucket }

          for each vbucket created

          { fsync(vbucket's fd) }

          unblock frontend traffic

          … you'd be doing what's intended with fsync(). You'll be scheduling lots of IO and then waiting for completion on a bunch of things that are already complete rather than schedule/wait/schedule/wait/schedule/wait, etc. It's not like there's any data protection in the interim anyway.

          I know that this is easy to say and probably harder to do, but it'd be a much more complete fix to the issue.

          While here, I should also mention briefly that if you're using Mac OS X, they chose to make fsync(2) a bit less guaranteed than on other unixes. See the man page for details but any changes here should be verified/tested on other platforms and/or you should use the right additional defines in Mac OS to get real behavior. The man page and the internet have more detail.

          Show
          ingenthr Matt Ingenthron added a comment - +1 on coalescing to less files. The mapping of vbuckets to files has always felt a bit wrong to me as you lose a lot of the benefits of AOFs when you have lots of AOFs. Regarding the fsync() however, I would still argue that there may be another fix. The use case this was raised for, flush_all from REST, or the other case, a new bucket both have a situation where the bucket isn't writeable at all until completion. If the logic were refactored from: block frontend traffic for each vbucket to create { create vbucket fsync(vbucket's fd) } unblock frontend traffic To: block frontend traffic for each vbucket to create { create vbucket } for each vbucket created { fsync(vbucket's fd) } unblock frontend traffic … you'd be doing what's intended with fsync(). You'll be scheduling lots of IO and then waiting for completion on a bunch of things that are already complete rather than schedule/wait/schedule/wait/schedule/wait, etc. It's not like there's any data protection in the interim anyway. I know that this is easy to say and probably harder to do, but it'd be a much more complete fix to the issue. While here, I should also mention briefly that if you're using Mac OS X, they chose to make fsync(2) a bit less guaranteed than on other unixes. See the man page for details but any changes here should be verified/tested on other platforms and/or you should use the right additional defines in Mac OS to get real behavior. The man page and the internet have more detail.

            People

            • Assignee:
              drigby Dave Rigby
              Reporter:
              alkondratenko Aleksey Kondratenko (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:

                Gerrit Reviews

                There are no open Gerrit changes