Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-50712

Don't reduce the number of replicas when computing vbmaps with # replicas >= # server-groups

    XMLWordPrintable

Details

    • 1
    • Yes

    Description

      Consider a 6 node cluster with 2 equally sized server groups with 3 KV nodes in each.

      In Neo, if there's one replica, the generated vbucket map looks like this:

      vbmap --num-nodes 6 --num-replicas 1 --num-slaves 10 --tags 0:0,1:0,2:0,3:1,4:1,5:1 --relax-all 1>/dev/null 
      ...
      Replicas balanced: true
      Found feasible R after 1 attempts
      Generated matrix R in 397.146µs (wall clock)
      Final map R:
          |  0   0   0   1   1   1 |
      ----|------------------------|
        0 |  0   0   0  57  57  57 | 171
        0 |  0   0   0  57  57  56 | 170
        0 |  0   0   0  57  57  57 | 171
        1 | 57  57  56   0   0   0 | 170
        1 | 57  57  57   0   0   0 | 171
        1 | 57  57  57   0   0   0 | 171
      ____|________________________|
          |171 171 170 171 171 170 |
      Built vbucket map from R in 1.36552ms (wall clock)
      

      This "R matrix" indicates that node 0 replicates 57 vbuckets to nodes 4,5 and 6, etc. The complete vbucket map is emitted too - but I've piped it to /dev/null here as it makes for easier reading.

      We see something similar in 7.0:

      vbmap --num-nodes 6 --num-replicas 1 --num-slaves 10 --tags 0:0,1:0,2:0,3:1,4:1,5:1 --relax-all 1>/dev/null 
      ...
      Final map R:
          |  0   0   0   1   1   1 |
      ----|------------------------|
        0 |  0   0   0  57  57  57 | 171
        0 |  0   0   0  57  57  57 | 171
        0 |  0   0   0  57  57  56 | 170
        1 | 57  57  57   0   0   0 | 171
        1 | 57  57  56   0   0   0 | 170
        1 | 57  57  57   0   0   0 | 171
      ____|________________________|
          |171 171 170 171 171 170 |
      Evaluation: 0
      Built vbucket map from R in 1.556905ms (wall clock)
      

      However, if the user has asked for 2 replicas, in Neo we see:

      vbmap --num-nodes 6 --num-replicas 2 --num-slaves 10 --tags 0:0,1:0,2:0,3:1,4:1,5:1 --relax-all 1>/dev/null 
      ...
      Search parameters:
      ...
        StrictReplicaBalance: false
        RelaxSlaveBalance: true
        RelaxReplicaBalance: true
        RelaxNumSlaves: true
        BalanceSlaves: true
        BalanceReplicas: true
      ...
      Replicas balanced: true
      Found feasible R after 1 attempts
      Generated matrix R in 416.267µs (wall clock)
      Final map R:
          |  0   0   0   1   1   1 |
      ----|------------------------|
        0 |  0   0   0  57  57  57 | 171
        0 |  0   0   0  57  57  57 | 171
        0 |  0   0   0  57  57  56 | 170
        1 | 57  57  57   0   0   0 | 171
        1 | 57  57  56   0   0   0 | 170
        1 | 57  57  57   0   0   0 | 171
      ____|________________________|
          |171 171 170 171 171 170 |
      

      However in 7.0 we see:

      vbmap --num-nodes 6 --num-replicas 2 --num-slaves 10 --tags 0:0,1:0,2:0,3:1,4:1,5:1 --relax-all 1>/dev/null 
      Started as:
        /Users/davefinlay/work8/install/bin/vbmap --num-nodes 6 --num-replicas 2 --num-slaves 10 --tags 0:0,1:0,2:0,3:1,4:1,5:1 --relax-all
      Using 1643681783568798000 as a seed
      Finalized parameters
        Number of nodes: 6
        Number of slaves: 3
        Number of vbuckets: 1024
        Number of replicas: 2
      ...
      Final map R:
          |  0   0   0   1   1   1 |
      ----|------------------------|
        0 |  0   0   0 114 114 114 | 342
        0 |  0   0   0 114 114 114 | 342
        0 |  0   0   0 114 113 113 | 340
        1 |114 113 113   0   0   0 | 340
        1 |114 114 114   0   0   0 | 342
        1 |114 114 114   0   0   0 | 342
      ____|________________________|
          |342 341 341 342 341 341 |
      

      So, essentially in Neo if the number of replicas is >= number of server groups then we change the number of replicas to be equal to the number of server groups less one. We do note in the UI that the aren't sufficient server groups to hold the specified number of replicas. My recollection of why we did this was simply that if we hold onto the old behavior, the data placement doesn't satisfy the rack-zone constraints. The rack-zone constraints can be thought of as "actives and replicas should be in different server groups". However, it might be OK to consider the rack-zone constraint as: "the replicas of a given vbucket must be in a different server group" and preserve the old behavior.

      There is some evidence that users of Couchbase Server have configurations with 2 server groups and 2 replicas. It's a little odd, but I guess the logic could be interpreted as the users considers the failure of a whole server group to be less likely than the failure of two individual nodes.

      We should look at how easy it would be to bring back to the 7.0 behavior. I'm not sure if we should do it, but we should investigate now and try and understand what's going on.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              anitha.kuberan Anitha Kuberan
              dfinlay Dave Finlay
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty