Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59054

Don't skip rebalance if there is a container change for any fts node.

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      A recent change (MB-57334)in the rebalance code path added a logic to early skip rebalance in case FTS topology didn't change.

      We realised that there are certain situations where we don't want to skip rebalance, even if FTS topology remains unchanged.

      Example:

      • Missing partitions due to previous failover operations ( Addressed MB-58450)
      • server group change for FTS nodes.

       

      To avoid skipping rebalance in case server group changes for FTS node(s), we will have to track NodeDefs across rebalances.
      As of now, we are only tracking NodeUUIDs across rebalances.

      keeping track of NodeUUIDs/NodeDefs at the time of last successful rebalance enable us to reconcile the state of latest NodeDefs as compared to prevNodeDefs (from last rebalance). Based on which we can decide whether to skip rebalance or not.

      Prev attempt to solve this: https://review.couchbase.org/c/cbgt/+/197432

      summary of comments on the PR:

      • On all nodes, we can track prevNodeDefs ( NodeDefs snapshot after last successful rebalance).
      • After every successful rebalance, we need to update prevNodeDefs on all the nodes. Only orchestrator node knows that rebalance completed successfully, it need some mechanism to let other nodes also. 
        • One way to achieve this using metakv. After a successful rebalance, orchestrator can notify other nodes (via metakv) to indicate that they need to update their local copy of prevNodeDefs.

      We also want to verify that the current mechanism to keep track of prevNodeUUIDs is correct.

      Attachments

        Issue Links

          Activity

            People

              mohd.shaadkhan Shaad Khan
              mohd.shaadkhan Shaad Khan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty