Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2938

Managed XDCR should remove unknown replications

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • None
    • operator
    • None
    • 3 - Uk Sprinting
    • 0

    Description

      Currently XDCR reconciliation fails if user has manually setup replications with xdcr.managed=false and then proceeds to enable managed xdcr.  

      The result is that the Operator fetches list of known XDCR replications and tries to look them up in its local cache, but this fails because these replications were never created by the Operator and thus never cached.  The error we see is

      {"level":"info","ts":1670226478.9425159,"logger":"cluster","msg":"Reconciliation failed","cluster":"tmobile/ttncbk8s","error":"timeout: key error: key xdcr-connection-ttncb-bm-hostname doesn't exist","stack":"github.com/couchbase/couchbase-operator/pkg/cluster/persistence.(*persistentStorageImpl).Get.func1\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/persistence/persistence.go:327\ngithub.com/couchbase/couchbase-operator/pkg/util/retryutil.Retry\n\tgithub.com/couchbase/couchbase-operator/pkg/util/retryutil/retryutil.go:14\ngithub.com/couchbase/couchbase-operator/pkg/util/retryutil.RetryFor\n\tgithub.com/couchbase/couchbase-operator/pkg/util/retryutil/retryutil.go:30\ngithub.com/couchbase/couchbase-operator/pkg/cluster/persistence.(*persistentStorageImpl).Get\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/persistence/persistence.go:335\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).getPersistentXDCRData\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/xdcr_replication.go:432\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).listRemoteClusters\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/xdcr_replication.go:482\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcileXDCR\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/xdcr_replication.go:729  

      • Stacktrace show's Operator is listing remote clusters (listRemoteClusters)
      • That function will return list of replication that Couchbase Server knows about.
      • Operator then goes to fetch the replication from persistence secret (getPersistentXDCRData) and fails.

       

      What should happen is the Operator should remove these foreign replications and proceed with reconciliation without failing.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              abhi.bose Abhi Bose (Inactive)
              tommie Tommie McAfee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty