Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62922

Work distribution is non-uniform for per-vBucket tasks

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • Morpheus
    • 7.6.0, 7.0.0, 7.1.0, 7.2.0
    • couchbase-bucket
    • None
    • 0

    Description

      There are several places in KV-Engine where we want to distribute vBuckets into equal-sized sets.

      • KV shards - #CPUs
      • ConnStore::vbConnLocks - 32 mutexes
      • EPBucket::getFlusher - #CPU flushers
      • EPBucket::getBgFetcher - #CPU BGFetchers
      • CheckpointMemRecoveryTask::getVbucketsSortedByChkMem - checkpoint removers
      • ... (might be others)

      What we have in all of those places is a finite resource which we want to share between vBuckets in a uniform way. For example, if we have 256 vBuckets on a node, we want 1 vbConnLock per 8 vBuckets.

      The way we've done this is by taking vbid % numTasks. This is problematic, because the vBuckets IDs on a node are not consequitive. ns_server computes the vBucket map and decides which vBucket IDs to assign to a given node. In practice, there are gaps in the ranges of vBuckets on a node.

      Example

      Here's an example to illustrate how that affects the task distribution.

      Assuming the following 256 active vBuckets are on a node, such that we have sequences of 16 consequitive numbers, then 48 vbids which are on the other nodes in the cluster, then more vbids. This repeats at interval of 64 vbids.

        0-15, 64-79, 128-143, 192-207, 256-271 ... 1008-1023

      And we look at the ConnStore::vbConnLocks example where we have 32 locks and distribute them across vBuckets.

      Given the vbids above, any vbid % 32 will be less than 16. This means that we will only ever use 16 of the 32 locks, because there might be no vBuckets whose number divided by 32 has a remainer above 15.

      This problem would become worse with a bucket configured with 128 vBuckets.

      Solution

      We should make it such that we equally distribute work regardless of the exact vbids assigned to a node.

      We can do this in several ways, here are two:

      • We could track the "index" of the vBucket in the set of vBuckets, such that the lowest vbid is always index 0, the second lowest is index 1, etc, and we can use those indexes in place of the vbid. This seems like it needs manual bookkeeping as vBuckets are created/destroyed, and results in vBucket's assignment changing at runtime.
      • We could use a hash function - we only require every output value to be ~ equally likely given an input. This will give a static allocation of vBucket -> hash.

      I think we should use the second option - a hash function. We can use crc32(vbid) as we already have that in the code base and crc32c's output is uniform across the 32-bit output space. By pre-computing the crc32c for the known vbids, we can optimise away the computation and crc32(vbid) will turn into a table lookup (1024 entries).

      'std::hash<Vbid>' could return this value.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            vesko.karaganev Vesko Karaganev
            vesko.karaganev Vesko Karaganev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty