XDCR Advanced filtering was introduced in 6.5.0. Customers that are running Server prior to 6.5.0 would not be using Adv Filering, but rather a traditional key-based regex filtering.
As part of the implementation in 6.5.0, there was an upgrade code that would automatically upgrade the key-based regex filter to become an advanced filter. However, the upgrade code did not check the cluster version before upgrading the filter. This means when one node is upgraded from <6.5.0 to >= 6.5.0 and rejoins a cluster that's pre-6.5, the replication's filter is upgraded to REGEXP_CONTAINS(meta.id(), "<filter>") automatically and persisted.
Future pipeline restart will cause the upgraded filter to be applied to nodes running pre-Adv filtering. This means mutations will be incorrectly filtered out, and will cause data loss.
This should have been tested and found back in the 6.5.0 timeframe, but it seems unlikely that it was run.
A simple test case could suffice:
- Start cluster on 6.0.x, 2-node source
- Create replication with a filter such as "^KU"
- Rebalance one node out. Upgrade to 6.6.X, rebalance back in.
- Filter gets upgraded. The node on 6.0.X will show the advanced filter instead of original key-based one.
- Pipeline will restart because of the 6.6.X node that rebalanced back in
- Further mutations even if it matches original filter of "^KU", will not be replicated.
Upgrade docs https://docs.couchbase.com/server/current/install/upgrade.html says that 6.6.X is the intermediate release, so this needs to be fixed at least in 6.6.x.
However, backport fixes to 7.0.x or 7.1.x should be considered given that once 6.6.X is out of the picture, we'll need to support upgrading to those version as baseline.