Details
-
Bug
-
Resolution: Fixed
-
Critical
-
5.5.3
-
centos1 cluster
-
Untriaged
-
-
Yes
Description
Build : 5.5.3-4029
Test : -test tests/integration/test_allFeatures_vulcan.yml -scope tests/integration/scope_Xattrs_Vulcan.yml
Scale : 3
Iteration : 1st
Test was undeploying the function bucket_op_complex_function_integration. Eventing producer crashed while undeploying the function. Following is the stack trace seen in the eventing logs.
2018-11-16T02:11:46.408-08:00 [Info] DCPT[eventing:tbahHeMq-138:{eventing:tbahHeMq-135:a1d5bb01f1ec555c016eb5dfe0071f30_bucket_op_complex_function_undeploy}/0] ##abcd ... stopped
|
2018-11-16T02:11:46.408-08:00 [Info] DCPT[eventing:tbahHeMq-138:{eventing:tbahHeMq-135:a1d5bb01f1ec555c016eb5dfe0071f30_bucket_op_complex_function_undeploy}/0] doReceive(): connection closed
|
2018-11-16T02:11:46.408-08:00 [Info] Producer::CleanupMetadataBucket [bucket_op_complex_function:0] Exiting cron timer cleanup routine, mutations till high vb seqnos received
|
2018-11-16T02:11:46.408-08:00 [Info] Producer::CleanupMetadataBucket [bucket_op_complex_function:0] Closed dcpFeed spawned for cleaning up metadata bucket artifacts
|
2018-11-16T02:11:46.408-08:00 [Info] SuperSupervisor::cleanupProducer [0] App: bucket_op_complex_function Purging timer entries from plasma
|
panic: File sync failed: /data/@eventing/bucket_op_complex_function_timer.data/header.data error fsync: bad file descriptor
|
|
goroutine 295 [running]:
|
panic(0xc3b9a0, 0xc42b7b2b80)
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 fp=0xc42ba7b7b0 sp=0xc42ba7b720
|
github.com/couchbase/plasma.syncFile(0xc4201f4058)
|
goproj/src/github.com/couchbase/plasma/util.go:257 +0x197 fp=0xc42ba7b848 sp=0xc42ba7b7b0
|
github.com/couchbase/plasma.(*multiFilelog).Commit(0xc420500400, 0x0, 0x15a5960)
|
goproj/src/github.com/couchbase/plasma/log.go:398 +0x11e fp=0xc42ba7b8a0 sp=0xc42ba7b848
|
github.com/couchbase/plasma.(*lsStore).flush(0xc4202662c0, 0xc4200121e0)
|
goproj/src/github.com/couchbase/plasma/lss.go:205 +0x24a fp=0xc42ba7b920 sp=0xc42ba7b8a0
|
github.com/couchbase/plasma.(*lsStore).(github.com/couchbase/plasma.flush)-fm(0xc4200121e0)
|
goproj/src/github.com/couchbase/plasma/lss.go:147 +0x34 fp=0xc42ba7b940 sp=0xc42ba7b920
|
github.com/couchbase/plasma.(*flushBuffer).Done(0xc4200121e0)
|
goproj/src/github.com/couchbase/plasma/lss.go:708 +0xdd fp=0xc42ba7b988 sp=0xc42ba7b940
|
github.com/couchbase/plasma.(*lsStore).Sync(0xc4202662c0, 0xc4292b2b00)
|
goproj/src/github.com/couchbase/plasma/lss.go:472 +0x93 fp=0xc42ba7b9c0 sp=0xc42ba7b988
|
github.com/couchbase/plasma.(*Plasma).PersistAll2(0xc4204a3800, 0x8)
|
goproj/src/github.com/couchbase/plasma/persistor.go:187 +0xd0 fp=0xc42ba7ba00 sp=0xc42ba7b9c0
|
github.com/couchbase/plasma.(*Plasma).PersistAll(0xc4204a3800)
|
goproj/src/github.com/couchbase/plasma/persistor.go:172 +0x37 fp=0xc42ba7ba20 sp=0xc42ba7ba00
|
github.com/couchbase/eventing/producer.(*Producer).persistPlasma(0xc4204fc380)
|
goproj/src/github.com/couchbase/eventing/producer/plasma_ops.go:47 +0x128 fp=0xc42ba7bf98 sp=0xc42ba7ba20
|
runtime.goexit()
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc42ba7bfa0 sp=0xc42ba7bf98
|
created by github.com/couchbase/eventing/producer.(*Producer).Serve
|
goproj/src/github.com/couchbase/eventing/producer/producer.go:204 +0x117f
|
|
...
|
...
|
[goport(/opt/couchbase/bin/eventing-producer)] 2018/11/16 02:12:05 child process exited with status 134
|
2018-11-16T02:12:05.303-08:00 [Info] Started eventing producer version: evt-5.5.3-4029-ee
|
2018-11-16T02:12:05.303-08:00 [Info] Setting IP mode to ipv4
|
Eventing nodes the cluster during this time : 172.23.96.145, 172.23.96.56
The panic above is seen in 172.23.96.56
I understand that with 6.0 release we have moved to a plasma-less design which will avoid the issue altogether, but its still worth looking in this issue for the following reasons:
1. We still support Eventing in 5.5.x and there might be customers who might not be willing to upgrade to 6.x yet.
2. This could be a potential bug in Plasma which surfaced only with this issue. Worthwhile to investigate to prevent it surfacing elsewhere.
This wouldnt be a regression since there are no relevant eventing changes in 5.5.3.
Attachments
For Gerrit Dashboard: MB-32055 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
101939,3 | MB-32055 Skip spawning plasma persist routine | vulcan | eventing | Status: MERGED | +2 | +1 |