Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-17506

NMVB should not contain a cluster_config body if the client has already received the same cluster_config version

    XMLWordPrintable

Details

    • Untriaged
    • Unknown
    • KV: Jan 18 - Feb 1

    Description

      Recently we've had cases where during rebalance the following sequence happens:

      1. takeover is slow on one or more vbucket moves
      2. after the old active vbucket moves to DEAD state the number of NOT_MY_VBUCKET replies skyrockets
      3. the NIC card of the machine is saturated, causing heartbeats to get missed from nodes
      4. node is put into pending state and rebalance halts (sometimes the node auto fails over)

      The reason the network gets saturated is that the NOT_MY_VBUCKET payload is big and since retries are proportional to requests, there are a lot of retries per second. Client changes could help manage the situation a little better however, the server should also do what it can to protect the network.

      Fixing this issue as it's filed (not returning a cluster_config body if the client already has it) would be a relatively simple-to-implement solution. I have spoken with folks from the SDK side (Brett Lawson,Michael Nitschinger and Matt Ingenthron) who tell me that returning a NMVB response with no cluster_config body shouldn't cause problems at the client since this is actually what used to happen and the SDK is backward compatible with this behavior (the client falls back to its last known cluster_config.)

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              andreibaranouski Andrei Baranouski
              dfinlay Dave Finlay
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty