Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7826

Allow for resynchronizing of failed-over node after rebalance

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 2.0, 2.0.1, 2.1.1, 2.2.0, 2.5.0
    • Fix Version/s: 3.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:

      Description

      Customer request to allow for a node that has not had a catastrophic failure to re-use it's data on disk when rejoining a cluster (after it has been failed over).

      This goes a long with being able to support larger datasets on each node. By forcing a rebalance after failover, we are potentially copying 100's of GB of data over the network when it is seemingly unnecessary. Not only does this take quite a bit of time over the network, it is using network bandwidth unnecessarily (potentially impacting performance) and resynchronizing from disk would alleviate this and speed up the process considerably.

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          perry Perry Krug added a comment -

          A similar request has also come up for being able to perform much more efficient "swap" rebalances. At least one customer needs to perform regular security updates on their hardware, and working with 96GB RAM and 1TB of disk, a 6 node cluster is going to take "too many" hours to complete. Given that this really needs to happen during a scheduled maintenance window and on a weekly basis, it seems we should really be able to provide a more efficient method. One rebalance at the beginning and maybe one at the end would be okay, but if we could re-sync during the swaps in the middle it would greatly speed up the overall process.

          Show
          perry Perry Krug added a comment - A similar request has also come up for being able to perform much more efficient "swap" rebalances. At least one customer needs to perform regular security updates on their hardware, and working with 96GB RAM and 1TB of disk, a 6 node cluster is going to take "too many" hours to complete. Given that this really needs to happen during a scheduled maintenance window and on a weekly basis, it seems we should really be able to provide a more efficient method. One rebalance at the beginning and maybe one at the end would be okay, but if we could re-sync during the swaps in the middle it would greatly speed up the overall process.
          Hide
          anil Anil Kumar added a comment -
          Show
          anil Anil Kumar added a comment - MB-9979

            People

            • Assignee:
              anil Anil Kumar
              Reporter:
              perry Perry Krug
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes