Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48399

Reconfiguring a magma bucket with different shard count handled ungracefully

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.1.0
    • 7.1.0
    • couchbase-bucket
    • Triaged
    • 1
    • No
    • KV-Engine-Sept-21, KV 2021-Oct-21

    Description

      Currently reconfiguring a magma bucket with a different shard count is not handled gracefully (I believe it crashes during warmup - it certainly used to in the cheshire-cat timeframe). This happens because the shard count determines the on disk layout of the data. For each shard a magma bucket creates one directory in which the WAL and associated vBuckets live. This is problematic for customers that may wish to scale a machine (by increasing or decreasing he number of CPUs a VM has access to) as they'd need to either backup and restore the data or XDCR it to a bucket that has the desired configuration.

      The work on group commit is related to this. The original plan was to have group commit allow us to scale flushing with a relatively low shard count (say 8 shards on a 64 CPU machine - the current default is 1 shard per CPU). We would then have been able to decrease the shard count to a fixed number appropriate for all machines. Group commit has been deferred to Morpheus though (MB-48353) so with the current 1 shard per CPU scaling in Neo a future upgrade / config change to enable group commit (with the intent of decreasing the number of shards to reduce write amplification) would require a non-trivial upgrade process.

      Potential solutions:

      1) Prevent the changing of shard counts after bucket instantiation. This is potentially tricky to do in kv_engine as we'd need to persist (perhaps to the stats.json document) a shard count before we initialize any of the magma shards which is done in the bucket constructor. Any future bucket construction would need to read this doc and change the shard count back to the original if required. This would allow the shard count to be set to any desired value though.

      2) Set the shard count for magma buckets to a fixed value (say 8). All buckets would have this shard count regardless of CPU count. This would negatively impact performance of persistance sensitive workloads, but is simple to implement.

      3) Allow the shard count to be reconfigured. Changing the shard count would require flushing the WAL and moving the vBucket (magma calls this a KVStore) to the new shard (directory). This would require work in both kv_engine and magma and is the most complicated option. For a good user experience when it comes to enabling group commit (and decreasing the number of shards) in Morpheus though this is likely required.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ankush.sharma Ankush Sharma
              ben.huddleston Ben Huddleston
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty