Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49054

Improve and better handle bucket (re)creation times for Magma buckets

    XMLWordPrintable

Details

    Description

      As seen in MB-47387, bucket creation times for buckets with a magma storage backend can take many minutes. Note that creating an empty bucket is fast, what we are actually talking about here is bucket _re_creation when the number of magma files is large. (The command sent by ns_server to memcached is create_bucket, which is why folks refer to the process as "bucket creation".)

      The reason things are slow is that magma needs to open all files associated with all vbuckets on that node, read the file headers and then apply outstanding changes in the write-ahead log before it can start responding to requests from ns_server about such things as vbucket state and vbucket high sequence numbers. The reason we noticed this is that bucket creation took more than 3 minutes (which was the old default timeout) in 7.1 testing. Currently, the magma timeout is set at 7 minutes, and we may need to increase it yet.

      In any case, bucket creation is slow if there are lots of files in the following cases:

      • node crash and restart
      • delta node recovery

      The first of these cases goes to HA. It would be good to be able to bring buckets online quickly in the case of a memcached crash.

      The second of these goes to the behavior of delta node recovery. If bucket creation times out, delta node recovery may fail continually and the user may have to resort to full recovery. Secondly, it's possible that the user will wait for many minutes to create the bucket only to find that the sequence numbers don't allow delta recovery and a full recovery is required. Thirdly, if a user decides to cancel the rebalance before the bucket is created, it's currently not possible for ns_server or KV to abort the bucket creation which will continue in the background and may interfere with another attempt to rebalance or perform delta node recovery.

      We should see what we can do to speed up bucket creation. This is likely to be a very significant piece of work (e.g. possibly factoring the vbucket state information into files on the "side") but perhaps there's something simpler out there.

      If we can't get bucket creation times down to a smaller number, we need to add standard things that we have for long running tasks, e.g.

      • support for cancelation
      • progress reporting

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              shivani.gupta Shivani Gupta
              dfinlay Dave Finlay
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty