Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47777

[BP 7.0.2] - XDCR - backfill_request_handler could hang forever

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • No

    Description

      Backfill request handler's cooldown mechanism is flawed such that it could cause the run() routine to run the persist case when there is no operations to persist.

      This can lead to the handler stuck forever waiting for an operation that will never come, and all backfill operations will be unresponsive... such as handling VB done events, or raising future backfills, etc

       

      A typical symptom would be a backfill pipeline that hangs and doesn't go away (potentially with changes_left staying at 0)

      The stack trace would show a bunch of go-routines doing HandleVBTaskDone(), (each one per VB), and one go routine stuck at this location:

      https://github.com/couchbase/goxdcr/blob/26d8add3a1c760f1c0c99569a4582e7b7c09c689/backfill_manager/backfill_request_handler.go#L296

      			// No more incoming requests - done bursting handling, do a single metakv operation
      			select {
      			case persistType := <-b.persistenceNeededCh: 
      				err := b.metaKvOp(persistType)
      

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-47777
          # Subject Branch Project Status CR V

          Activity

            Build couchbase-server-7.0.1-5995 contains goxdcr commit a0f3465 with commit message:
            MB-47777 - backfill request handler could hang due to incorrectly implemented persist timer

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.1-5995 contains goxdcr commit a0f3465 with commit message: MB-47777 - backfill request handler could hang due to incorrectly implemented persist timer

            People

              neil.huang Neil Huang
              neil.huang Neil Huang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty