Details
-
Technical task
-
Resolution: Fixed
-
Major
-
None
-
None
Description
While running QE rebalance tests, we're seeing some instances where the Sync Gateway process is crashing due to a concurrent map read/write in GocbDCPFeed.End:
Oct 16 13:55:56 localhost bash: fatal error: concurrent map read and map write
|
Oct 16 13:55:56 localhost bash: goroutine 32806 [running]:
|
Oct 16 13:55:56 localhost bash: runtime.throw(0xe86961, 0x21)
|
Oct 16 13:55:56 localhost bash: /usr/local/go/1.12.10/go/src/runtime/panic.go:617 +0x72 fp=0xc001b14cf8 sp=0xc001b14cc8 pc=0x42f222
|
Oct 16 13:55:56 localhost bash: runtime.mapaccess1(0xd37de0, 0xc0002b5bc0, 0xc001b14d7e, 0x1)
|
Oct 16 13:55:56 localhost bash: /usr/local/go/1.12.10/go/src/runtime/map.go:413 +0x277 fp=0xc001b14d40 sp=0xc001b14cf8 pc=0x40f337
|
Oct 16 13:55:56 localhost bash: github.com/couchbase/cbgt.(*GocbDCPFeed).End(0xc00071e5a0, 0xfe0044, 0xfe7160, 0x188d600)
|
Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/github.com/couchbase/cbgt/feed_dcp_gocb.go:554 +0x59 fp=0xc001b14db8 sp=0xc001b14d40 pc=0xa14c99
|
Oct 16 13:55:56 localhost bash: gopkg.in/couchbase/gocbcore%2ev7.(*Agent).OpenStream.func1(0x0, 0xc0011f03c0, 0xfe7160, 0x188d600)
|
Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/gopkg.in/couchbase/gocbcore.v7/agentops_dcp.go:82 +0x6a9 fp=0xc001b14e70 sp=0xc001b14db8 pc=0x81f469
|
Oct 16 13:55:56 localhost bash: gopkg.in/couchbase/gocbcore%2ev7.(*memdQRequest).tryCallback(0xc0011f03c0, 0x0, 0xfe7160, 0x188d600, 0xc001b14e01)
|
Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/gopkg.in/couchbase/gocbcore.v7/memdqpackets.go:78 +0x70 fp=0xc001b14ea0 sp=0xc001b14e70 pc=0x812d60
|
Oct 16 13:55:56 localhost bash: gopkg.in/couchbase/gocbcore%2ev7.(*memdClient).run.func2.1(0xc0011f03c0)
|
Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/gopkg.in/couchbase/gocbcore.v7/memdclient.go:376 +0x80 fp=0xc001b14ee8 sp=0xc001b14ea0 pc=0x8236b0
|
Oct 16 13:55:56 localhost bash: gopkg.in/couchbase/gocbcore%2ev7.(*memdOpMap).Drain(0xc000dbc5f0, 0xc001b14fa0)
|
Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/gopkg.in/couchbase/gocbcore.v7/memdopmap.go:129 +0x3e fp=0xc001b14f08 sp=0xc001b14ee8 pc=0x80fb2e
|
Oct 16 13:55:56 localhost bash: gopkg.in/couchbase/gocbcore%2ev7.(*memdClient).run.func2(0xc000dbc5a0, 0xc000093ce0, 0xc000093d40, 0xc000093da0)
|
Oct 16 13:55:56 localhost bash: /home/couchbase/jenkins/workspace/sgw-unix-build/2.7.0/enterprise/godeps/src/gopkg.in/couchbase/gocbcore.v7/memdclient.go:371 +0x606 fp=0xc001b14fc0 sp=0xc001b14f08 pc=0x823d16
|
Oct 16 13:55:56 localhost bash: runtime.goexit()
|
I haven't been able to isolate exactly what's triggering the call to End, but it seems like the writes f.lastReceivedSeqno[vbId] throughout would be inherently racy. Changing to a slice might be sufficient, if you're sure that there aren't concurrent operations per vbucket. Otherwise a mutex or sync.map seems necessary.
Attachments
For Gerrit Dashboard: MB-36521 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
116547,3 | MB-36521: GocbDCPFeed's lastReceivedSeqno access within Mutex | master | cbgt | Status: MERGED | +2 | +1 |