Details
Description
The Backfill Manager does not start properly at https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/replication_manager/replication_manager.go#508
There is no error checking here and hence errors go unnoticed. This may cause "runRetryMonitor" goroutine not to start if the "Start()" fails and bails out much before. See https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/backfill_manager/backfill_manager.go#308
The runRetryMonitor consumes from the channel but its non running can cause consumers to wait indefinitely, thereby cause a cascading chain of waits and deadlock.
Test scenario:
- Cluster with non-KV nodes
- Create a replication
- Trigger a backfill
- Check the goxdcr on non-kv nodes. It should not be in stuck state.
- Delete the spec
- Check the goxdcr on non-kv nodes. It should not be in stuck state.