Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7630

Alternative consistent hashing mapping mechanism

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: techdebt-backlog
    • Fix Version/s: feature-backlog
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
      None

      Description

      The current code implements Ketama which builds a sorted list of 100 or 160 hash points per server, and then performs a logarithmic search for the server with the closest point.

      Consider the following alternative: Given N servers, create an array of N*(N-1) entries, and fill it with servers in such a way that each server appears (N-1) times, and adjacent server pairs occur exactly once. For example, for N=3 with servers A,B,C, one possible arrangement is servers := [ A, B, C, A, C, B ]

      When mapping a key, calculate its hash as usual, then use s = servers[ hash(key) % N*(N-1) ] to determine the mapping.

      When a server fails, remap keys to the next server in the array: e.g. when B fails, the array is modified to [ A, C, C, A, C, A ]. Assuming a uniform key hash function, this guarantees an even redistribution of keys

      The mapping function is simpler and more performant than the Ketama approach, while also exhibiting better cache locality.
      Using 8 bits per server, this approach would only require 65K to support up to 256 servers.

      Weighted consistent hashing can be implemented in a similar fashion, varying the frequency of servers in the array according to their weight.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        jbemmel Jeroen van Bemmel added a comment -

        A downside of this approach is that adding a server gets more complicated. It is possible to use m adjacent copies of the N*(N-1) array, selectively replacing servers with a new server being added. However, there comes a point where this becomes impractical. This approach may therefore be most suited to clusters that do not need to grow, or that can support significant rebalancing ( e.g. because data volumes are low and/or server additions are infrequent ). There is a trade-off between efficient data access ( = frequent ) versus more expensive cluster expansion

        Show
        jbemmel Jeroen van Bemmel added a comment - A downside of this approach is that adding a server gets more complicated. It is possible to use m adjacent copies of the N*(N-1) array, selectively replacing servers with a new server being added. However, there comes a point where this becomes impractical. This approach may therefore be most suited to clusters that do not need to grow, or that can support significant rebalancing ( e.g. because data volumes are low and/or server additions are infrequent ). There is a trade-off between efficient data access ( = frequent ) versus more expensive cluster expansion
        Hide
        jbemmel Jeroen van Bemmel added a comment -

        A variant of this approach: Allocate an array of 2520 ( = 2*2*2*3*3*5*7 ) entries and fill it with active node IDs. To add a new server X, replace entries with X in an evenly distributed way ( to minimize the number of vBuckets that get rebalanced ). When a server fails, replace its entries with active node IDs ( again in an evenly distributed way )

        By construction, 2520 divides evenly across server clusters of N=1,2,3,4,5,6,7,8,9,10,12,14,15,16,18,20,21,24, ... servers, while still being close to balanced for the other (prime multiple) values of N. Given that Ketama is only statistically evenly distributed and key hashes vary anyway, this should result in a similarly balanced data set

        Show
        jbemmel Jeroen van Bemmel added a comment - A variant of this approach: Allocate an array of 2520 ( = 2*2*2*3*3*5*7 ) entries and fill it with active node IDs. To add a new server X, replace entries with X in an evenly distributed way ( to minimize the number of vBuckets that get rebalanced ). When a server fails, replace its entries with active node IDs ( again in an evenly distributed way ) By construction, 2520 divides evenly across server clusters of N=1,2,3,4,5,6,7,8,9,10,12,14,15,16,18,20,21,24, ... servers, while still being close to balanced for the other (prime multiple) values of N. Given that Ketama is only statistically evenly distributed and key hashes vary anyway, this should result in a similarly balanced data set
        Hide
        avsej Sergey Avseyev added a comment -

        Currently libvbucket (and libcouchbase) picks the algorithm of distribution from the JSON config, key "nodeLocator". I think that the patch to libvbucket should be implemented after the server will provide means to specify the algorithm.

        Show
        avsej Sergey Avseyev added a comment - Currently libvbucket (and libcouchbase) picks the algorithm of distribution from the JSON config, key "nodeLocator". I think that the patch to libvbucket should be implemented after the server will provide means to specify the algorithm.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        removing ns_server component because it's not quite our busyness here. It's higher level architectural stuff.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - removing ns_server component because it's not quite our busyness here. It's higher level architectural stuff.
        Hide
        trond Trond Norbye added a comment -

        We're mapping to vbuckets by using crc and mods (currently not for memcached buckets, but I'd rather use vbuckets for memcached buckets and be able to move data around rather than add a new hashing algorithm)

        Show
        trond Trond Norbye added a comment - We're mapping to vbuckets by using crc and mods (currently not for memcached buckets, but I'd rather use vbuckets for memcached buckets and be able to move data around rather than add a new hashing algorithm)

          People

          • Assignee:
            steve Steve Yen
            Reporter:
            jbemmel Jeroen van Bemmel
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes