Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-61310

FTS rebalance must wait for seq no. catchup on replicas (configurable to skip)

    XMLWordPrintable

Details

    • 0
    • No

    Description

      Enabling the replica to catchup to expected seq no. (specifically the sourceSeq no.) would help in keeping the system in a more consistent state following a rebalance operation.

      This will help in making sure that in the subsequent rebalance operations won't experience incorrect behaviours such rollbacks due to partially built partitions. A summary of the situation from the CBSE-16450:

      1. During the rebalance operations, some of the partitions' movement failed file transfer (due to an operational race with the merge operations at the time) causing the partitions to fallback to getting rebuilt from scratch from KV causing a large backfill - this caused slowness in partitions becoming completely available at the new location in the cluster.
      2. During the following failover + rebalance operations, rollbacks occurred - on new replicas which were being built using the partitions being reconstructed (from (2)) as a reference (file transfer) and due to the requested sequence number post file transfer trailing the purged sequence number on KV - therefore causing double the IOPS to rebuild them.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              sarthak.dua Sarthak Dua
              thejas.orkombu Thejas Orkombu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty