Uploaded image for project: 'Couchbase Go SDK'
  1. Couchbase Go SDK
  2. GOCBC-897

Data race on cluster.Close

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.1.1
    • 2.1.2
    • library
    • None
    • 1

    Description

      The following code repeatedly reads a key from a bucket while the underlying connection sometimes gets closed and replaced:

      package main
       
      import (
      	"fmt"
      	"sync"
      	"sync/atomic"
      	"time"
       
      	"github.com/couchbase/gocb/v2"
      )
       
      func main() {
      	url := "..."
      	user := ".."
      	password := "..."
      	bucketName := "..."
      	key := "..."
       
      	type Instance struct {
      		cluster *gocb.Cluster
      		bucket  *gocb.Bucket
      	}
       
      	var instance atomic.Value
       
      	go func() {
      		for {
      			time.Sleep(5 * time.Second)
      			cluster, err := gocb.Connect(url, gocb.ClusterOptions{
      				Username: user,
      				Password: password,
      			})
      			if err != nil {
      				panic(err)
      			}
      			bucket := cluster.Bucket(bucketName)
      			old := instance.Load()
      			if old != nil {
      				old := old.(*Instance)
      				old.cluster.Close(nil)
      			}
      			instance.Store(&Instance{
      				cluster: cluster,
      				bucket:  bucket,
      			})
      		}
      	}()
       
      	for i := 0; i < 100; i++ {
      		go func() {
      			for {
      				time.Sleep(1 * time.Second)
      				instV := instance.Load()
      				if instV == nil {
      					continue
      				}
      				inst := instV.(*Instance)
      				res, err := inst.bucket.DefaultCollection().Get(key, &gocb.GetOptions{
      					Timeout: 100 * time.Millisecond,
      				})
      				if err != nil {
      					fmt.Printf("Get err: %v\n", err)
      					continue
      				}
      				var v interface{}
      				err = res.Content(&v)
      				if err != nil {
      					fmt.Printf("Content err: %v\n", err)
      					continue
      				}
      				fmt.Printf("content: %#v\n", v)
      			}
      		}()
      	}
       
      	var wg sync.WaitGroup
      	wg.Add(1)
      	wg.Wait()
      }
      
      

      This sometimes results in a data race:

      ==================
      WARNING: DATA RACE
      Write at 0x00c000498260 by goroutine 167:
        github.com/couchbase/gocb/v2.(*Collection).getDirect.func1()
            /home/n/go/pkg/mod/github.com/couchbase/gocb/v2@v2.1.1/collection_crud.go:294 +0xbc
        github.com/couchbase/gocbcore/v9.(*crudComponent).Get.func1()
            /home/n/go/pkg/mod/github.com/couchbase/gocbcore/v9@v9.0.1/crudcomponent.go:33 +0xc2
        github.com/couchbase/gocbcore/v9.(*memdQRequest).cancelWithCallback()
            /home/n/go/pkg/mod/github.com/couchbase/gocbcore/v9@v9.0.1/memdqpackets.go:228 +0xaf
        github.com/couchbase/gocbcore/v9.(*crudComponent).Get.func2()
            /home/n/go/pkg/mod/github.com/couchbase/gocbcore/v9@v9.0.1/crudcomponent.go:80 +0x530
       
      Previous write at 0x00c000498260 by goroutine 97:
        github.com/couchbase/gocb/v2.(*Collection).getDirect()
            /home/n/go/pkg/mod/github.com/couchbase/gocb/v2@v2.1.1/collection_crud.go:313 +0xb80
        github.com/couchbase/gocb/v2.(*Collection).Get()
            /home/n/go/pkg/mod/github.com/couchbase/gocb/v2@v2.1.1/collection_crud.go:258 +0xb4
        main.main.func2()
            /home/n/go/src/temp/gocb-race/main.go:58 +0x147
       
      Goroutine 167 (running) created at:
        time.goFunc()
            /usr/local/go/src/time/sleep.go:168 +0x51
       
      Goroutine 97 (running) created at:
        main.main()
            /home/n/go/src/temp/gocb-race/main.go:50 +0x123
      ==================
      

      The problem is that errOut is set concurrently in the function body (https://github.com/couchbase/gocb/blob/v2.1.1/collection_crud.go#L313) and the opm.Wait callback (https://github.com/couchbase/gocb/blob/v2.1.1/collection_crud.go#L294). This pattern is used in a lot of methods, so presumably all of them are affected, but I was only able to test it with Get.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Fixed as a part of GOCBC-894. This issue was arising because operations were being performed on an already closed Cluster object.

            charles.dixon Charles Dixon added a comment - Fixed as a part of GOCBC-894 . This issue was arising because operations were being performed on an already closed Cluster object.

            Whilst not the same issue both of these issues surface because of the timer callback being hit before we return from the operation.

            charles.dixon Charles Dixon added a comment - Whilst not the same issue both of these issues surface because of the timer callback being hit before we return from the operation.

            People

              charles.dixon Charles Dixon
              nikolakovacs Nikola Kovacs
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty