Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53142

[CBM] Potential race condition in Rift Restore pipeline

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.1.2
    • 7.1.0, 7.1.1
    • tools
    • Untriaged
    • 1
    • Yes

    Description

      What's the issue?
      The crux of this issue is that our worker pool doesn't accept a context to handle clean teardown. In lieu of this, we have a context that we use to perform clean teardown.

      // done - Handles the error returned by a worker, canceling the context if non-nil.
      done := func(err error) error {
      	if err == nil {
      		return nil
      	}
       
      	cancel()
       
      	return err
      }
      

      The error here, is that the context is canceled prior to the error being returned; depending on how goroutines are scheduled, the error returned by the worker pool may be:

      1. This error
      2. Or the error returned by the worker that tears down first (which is likely to be <nil>)

      What's the fix?
      We should probably have the worker pool properly use contexts to support clean cancellation.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            gilad.kalchheim Gilad Kalchheim
            james.lee James Lee
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty