Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60992

XDCR: deadlock in deletion of spec due to BackfillMgr did not start properly

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.1.3, 7.1.5
    • XDCR
    • None
    • Untriaged
    • 0
    • No

    Description

      The Backfill Manager does not start properly at https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/replication_manager/replication_manager.go#508

      There is no error checking here and hence errors go unnoticed. This may cause "runRetryMonitor" goroutine not to start if the "Start()" fails and bails out much before. See https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/backfill_manager/backfill_manager.go#308

      The runRetryMonitor consumes from the channel but its non running can cause consumers to wait indefinitely, thereby cause a cascading chain of waits and deadlock.

      Test scenario:

      1. Cluster with non-KV nodes
      2. Create a replication
      3. Trigger a backfill
      4. Check the goxdcr on non-kv nodes. It should not be in stuck state.
      5. Delete the spec
      6. Check the goxdcr on non-kv nodes. It should not be in stuck state.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-60992
          # Subject Branch Project Status CR V

          Activity

            People

              sudeep.jathar Sudeep Jathar
              sudeep.jathar Sudeep Jathar
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty