Uploaded image for project: 'Couchbase Go SDK'
  1. Couchbase Go SDK
  2. GOCBC-897

Data race on cluster.Close

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.1
    • Fix Version/s: 2.1.2
    • Component/s: library
    • Labels:
      None
    • Environment:
    • Story Points:
      1

      Description

      The following code repeatedly reads a key from a bucket while the underlying connection sometimes gets closed and replaced:

      package main
       
      import (
      	"fmt"
      	"sync"
      	"sync/atomic"
      	"time"
       
      	"github.com/couchbase/gocb/v2"
      )
       
      func main() {
      	url := "..."
      	user := ".."
      	password := "..."
      	bucketName := "..."
      	key := "..."
       
      	type Instance struct {
      		cluster *gocb.Cluster
      		bucket  *gocb.Bucket
      	}
       
      	var instance atomic.Value
       
      	go func() {
      		for {
      			time.Sleep(5 * time.Second)
      			cluster, err := gocb.Connect(url, gocb.ClusterOptions{
      				Username: user,
      				Password: password,
      			})
      			if err != nil {
      				panic(err)
      			}
      			bucket := cluster.Bucket(bucketName)
      			old := instance.Load()
      			if old != nil {
      				old := old.(*Instance)
      				old.cluster.Close(nil)
      			}
      			instance.Store(&Instance{
      				cluster: cluster,
      				bucket:  bucket,
      			})
      		}
      	}()
       
      	for i := 0; i < 100; i++ {
      		go func() {
      			for {
      				time.Sleep(1 * time.Second)
      				instV := instance.Load()
      				if instV == nil {
      					continue
      				}
      				inst := instV.(*Instance)
      				res, err := inst.bucket.DefaultCollection().Get(key, &gocb.GetOptions{
      					Timeout: 100 * time.Millisecond,
      				})
      				if err != nil {
      					fmt.Printf("Get err: %v\n", err)
      					continue
      				}
      				var v interface{}
      				err = res.Content(&v)
      				if err != nil {
      					fmt.Printf("Content err: %v\n", err)
      					continue
      				}
      				fmt.Printf("content: %#v\n", v)
      			}
      		}()
      	}
       
      	var wg sync.WaitGroup
      	wg.Add(1)
      	wg.Wait()
      }
      
      

      This sometimes results in a data race:

      ==================
      WARNING: DATA RACE
      Write at 0x00c000498260 by goroutine 167:
        github.com/couchbase/gocb/v2.(*Collection).getDirect.func1()
            /home/n/go/pkg/mod/github.com/couchbase/gocb/v2@v2.1.1/collection_crud.go:294 +0xbc
        github.com/couchbase/gocbcore/v9.(*crudComponent).Get.func1()
            /home/n/go/pkg/mod/github.com/couchbase/gocbcore/v9@v9.0.1/crudcomponent.go:33 +0xc2
        github.com/couchbase/gocbcore/v9.(*memdQRequest).cancelWithCallback()
            /home/n/go/pkg/mod/github.com/couchbase/gocbcore/v9@v9.0.1/memdqpackets.go:228 +0xaf
        github.com/couchbase/gocbcore/v9.(*crudComponent).Get.func2()
            /home/n/go/pkg/mod/github.com/couchbase/gocbcore/v9@v9.0.1/crudcomponent.go:80 +0x530
       
      Previous write at 0x00c000498260 by goroutine 97:
        github.com/couchbase/gocb/v2.(*Collection).getDirect()
            /home/n/go/pkg/mod/github.com/couchbase/gocb/v2@v2.1.1/collection_crud.go:313 +0xb80
        github.com/couchbase/gocb/v2.(*Collection).Get()
            /home/n/go/pkg/mod/github.com/couchbase/gocb/v2@v2.1.1/collection_crud.go:258 +0xb4
        main.main.func2()
            /home/n/go/src/temp/gocb-race/main.go:58 +0x147
       
      Goroutine 167 (running) created at:
        time.goFunc()
            /usr/local/go/src/time/sleep.go:168 +0x51
       
      Goroutine 97 (running) created at:
        main.main()
            /home/n/go/src/temp/gocb-race/main.go:50 +0x123
      ==================
      

      The problem is that errOut is set concurrently in the function body (https://github.com/couchbase/gocb/blob/v2.1.1/collection_crud.go#L313) and the opm.Wait callback (https://github.com/couchbase/gocb/blob/v2.1.1/collection_crud.go#L294). This pattern is used in a lot of methods, so presumably all of them are affected, but I was only able to test it with Get.

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            charles.dixon Charles Dixon added a comment -

            Whilst not the same issue both of these issues surface because of the timer callback being hit before we return from the operation.

            Show
            charles.dixon Charles Dixon added a comment - Whilst not the same issue both of these issues surface because of the timer callback being hit before we return from the operation.
            Hide
            charles.dixon Charles Dixon added a comment -

            Fixed as a part of GOCBC-894. This issue was arising because operations were being performed on an already closed Cluster object.

            Show
            charles.dixon Charles Dixon added a comment - Fixed as a part of GOCBC-894 . This issue was arising because operations were being performed on an already closed Cluster object.

              People

              Assignee:
              charles.dixon Charles Dixon
              Reporter:
              nikolakovacs Nikola Kovacs
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty