Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36080 Extending CBGT's gocb support for DCP
  3. MB-36521

GocbDCPFeed panicking on concurrent map read/write

    XMLWordPrintable

Details

    • Technical task
    • Resolution: Fixed
    • Major
    • None
    • None
    • cbgt, fts

    Description

      While running QE rebalance tests, we're seeing some instances where the Sync Gateway process is crashing due to a concurrent map read/write in GocbDCPFeed.End:

      Oct 16 13:55:56 localhost bash: fatal error: concurrent map read and map write
      Oct 16 13:55:56 localhost bash: goroutine 32806 [running]:
      Oct 16 13:55:56 localhost bash: runtime.throw(0xe86961, 0x21)
      Oct 16 13:55:56 localhost bash: /usr/local/go/1.12.10/go/src/runtime/panic.go:617 +0x72 fp=0xc001b14cf8 sp=0xc001b14cc8 pc=0x42f222
      Oct 16 13:55:56 localhost bash: runtime.mapaccess1(0xd37de0, 0xc0002b5bc0, 0xc001b14d7e, 0x1)
      Oct 16 13:55:56 localhost bash: /usr/local/go/1.12.10/go/src/runtime/map.go:413 +0x277 fp=0xc001b14d40 sp=0xc001b14cf8 pc=0x40f337
      Oct 16 13:55:56 localhost bash: github.com/couchbase/cbgt.(*GocbDCPFeed).End(0xc00071e5a0, 0xfe0044, 0xfe7160, 0x188d600)
      Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/github.com/couchbase/cbgt/feed_dcp_gocb.go:554 +0x59 fp=0xc001b14db8 sp=0xc001b14d40 pc=0xa14c99
      Oct 16 13:55:56 localhost bash: gopkg.in/couchbase/gocbcore%2ev7.(*Agent).OpenStream.func1(0x0, 0xc0011f03c0, 0xfe7160, 0x188d600)
      Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/gopkg.in/couchbase/gocbcore.v7/agentops_dcp.go:82 +0x6a9 fp=0xc001b14e70 sp=0xc001b14db8 pc=0x81f469
      Oct 16 13:55:56 localhost bash: gopkg.in/couchbase/gocbcore%2ev7.(*memdQRequest).tryCallback(0xc0011f03c0, 0x0, 0xfe7160, 0x188d600, 0xc001b14e01)
      Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/gopkg.in/couchbase/gocbcore.v7/memdqpackets.go:78 +0x70 fp=0xc001b14ea0 sp=0xc001b14e70 pc=0x812d60
      Oct 16 13:55:56 localhost bash: gopkg.in/couchbase/gocbcore%2ev7.(*memdClient).run.func2.1(0xc0011f03c0)
      Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/gopkg.in/couchbase/gocbcore.v7/memdclient.go:376 +0x80 fp=0xc001b14ee8 sp=0xc001b14ea0 pc=0x8236b0
      Oct 16 13:55:56 localhost bash: gopkg.in/couchbase/gocbcore%2ev7.(*memdOpMap).Drain(0xc000dbc5f0, 0xc001b14fa0)
      Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/gopkg.in/couchbase/gocbcore.v7/memdopmap.go:129 +0x3e fp=0xc001b14f08 sp=0xc001b14ee8 pc=0x80fb2e
      Oct 16 13:55:56 localhost bash: gopkg.in/couchbase/gocbcore%2ev7.(*memdClient).run.func2(0xc000dbc5a0, 0xc000093ce0, 0xc000093d40, 0xc000093da0)
      Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/gopkg.in/couchbase/gocbcore.v7/memdclient.go:371 +0x606 fp=0xc001b14fc0 sp=0xc001b14f08 pc=0x823d16
      Oct 16 13:55:56 localhost bash: runtime.goexit()
      

      I haven't been able to isolate exactly what's triggering the call to End, but it seems like the writes f.lastReceivedSeqno[vbId] throughout would be inherently racy. Changing to a slice might be sufficient, if you're sure that there aren't concurrent operations per vbucket. Otherwise a mutex or sync.map seems necessary.

      Attachments

        For Gerrit Dashboard: MB-36521
        # Subject Branch Project Status CR V

        Activity

          People

            abhinav Abhi Dangeti
            adamf Adam Fraser
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty