Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
6.0.0
-
Untriaged
-
No
Description
Build 6.0.0-1693
While running high bucket density tests, in one of the run we observed following panic.
Note that this panic occurred only once and rerun on same test does not reproduced it.
Job- http://perf.jenkins.couchbase.com/job/arke-multi-bucket/231
Logs- https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-arke-multi-bucket-231/172.23.97.12.zip
Panic-
2019-01-01T22:56:33.944-08:00 ERRO GOXDCR.XmemNozzle: xmem_de729116c99d5a3d3552c8e4deff1d21/bucket-19/bucket-19_172.23.96.16:11210_1 Received temporary error in setMeta response. Response status=TMPFAIL, err = <nil>, response=<ud>MCResponse status=TMPFAIL, opcode=0xa2, opaque=9634587, msg: </ud>
|
2019-01-01T22:56:33.944-08:00 INFO GOXDCR.StatsMgr: de729116c99d5a3d3552c8e4deff1d21/bucket-10/bucket-10-623320309 message
|
2019-01-01T22:56:33.944-08:00 INFO GOXDCR.ReplMgr: checkReplicationStatus exited
|
panic: runtime error: invalid memory address or nil pointer dereference
|
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x71a578]
|
|
goroutine 54 [running]:
|
expvar.(*Map).Get(0x0, 0xa3b4a2, 0xe, 0x0, 0x0)
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.8.5/go/src/expvar/expvar.go:150 +0x38
|
github.com/couchbase/goxdcr/pipeline_svc.updateStatsForReplication(0xc42094aa80, 0xc4a9fb16e0, 0xcc9a20, 0xc420146e00, 0xc420132ed0, 0xc4201503c0, 0xcd27c0, 0xc420172028, 0x3, 0x0)
|
goproj/src/github.com/couchbase/goxdcr/pipeline_svc/statistics_manager.go:1347 +0x173
|
github.com/couchbase/goxdcr/pipeline_svc.UpdateStats(0xcc47a0, 0xc42014e9e0, 0xccba80, 0xc420166930, 0xcc9a20, 0xc420146e00, 0xc420132ed0, 0xc4201503c0, 0xcd27c0, 0xc420172028)
|
goproj/src/github.com/couchbase/goxdcr/pipeline_svc/statistics_manager.go:1248 +0x34f
|
github.com/couchbase/goxdcr/replication_manager.(*replicationManager).checkReplicationStatus(0xcfeae0, 0xc420071110)
|
goproj/src/github.com/couchbase/goxdcr/replication_manager/replication_manager.go:358 +0x312
|
created by github.com/couchbase/goxdcr/replication_manager.StartReplicationManager.func1
|
goproj/src/github.com/couchbase/goxdcr/replication_manager/replication_manager.go:166 +0x4ae
|
[goport(/opt/couchbase/bin/goxdcr)] 2019/01/01 22:56:34 child process exited with status 2
|
Attachments
For Gerrit Dashboard: MB-32456 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
103250,3 | MB-32456 fix panic caused by nil overviewStats | master | goxdcr | Status: MERGED | +2 | +1 |
The root cause is a nil replication overview stats introduced by replication stop sequence. There was "xmem is stuck" errors due to
MB-31764, which could have been a triggering action. I am still trying to figure out how this happened. It does not look like a regression in stats manager itself, though.