Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
6.5.0
-
Untriaged
-
Unknown
Description
Indexer crashes with the following stack trace at the TestIndexNodeRebalanceOut test during the run http://ci2i-unstable.northscale.in/gsi-10.12.2019-20.06.fail.html
2019-12-10T22:48:42.710+05:30 [Info] clustMgrAgent::OnIndexDelete Success for Drop IndexId 14723988844231693918 |
panic: runtime error: index out of range
|
|
goroutine 125393 [running]: |
panic(0xf8a3e0, 0xc420018150) |
/home/buildbot/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 |
github.com/couchbase/indexing/secondary/indexer.(*StreamState).updateRepairState(0xc4202ba000, 0xc4237e0001, 0xc424e75618, 0x7, 0xc4238a8000, 0x1af, 0x200, 0x0, 0x0, 0x0) |
goproj/src/github.com/couchbase/indexing/secondary/indexer/stream_state.go:541 +0x3a5 |
github.com/couchbase/indexing/secondary/indexer.(*timekeeper).sendRestartMsg(0xc420136080, 0x1a9cde0, 0xc427d5a150) |
goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3103 +0x22ca |
created by github.com/couchbase/indexing/secondary/indexer.(*timekeeper).repairStream
|
goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3016 +0xf27 |
This issue seems to have been fixed under MB-36341 but the panic is still been seen with 6.5.0-4928 build.
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Description |
Indexer crashes with the following stack trace:
{code:java}2019-12-10T22:48:42.710+05:30 [Info] clustMgrAgent::OnIndexDelete Success for Drop IndexId 14723988844231693918 panic: runtime error: index out of range goroutine 125393 [running]: panic(0xf8a3e0, 0xc420018150) /home/buildbot/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 github.com/couchbase/indexing/secondary/indexer.(*StreamState).updateRepairState(0xc4202ba000, 0xc4237e0001, 0xc424e75618, 0x7, 0xc4238a8000, 0x1af, 0x200, 0x0, 0x0, 0x0) goproj/src/github.com/couchbase/indexing/secondary/indexer/stream_state.go:541 +0x3a5 github.com/couchbase/indexing/secondary/indexer.(*timekeeper).sendRestartMsg(0xc420136080, 0x1a9cde0, 0xc427d5a150) goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3103 +0x22ca created by github.com/couchbase/indexing/secondary/indexer.(*timekeeper).repairStream goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3016 +0xf27 {code} This issue seems to have been fixed under |
Indexer crashes with the following stack trace at the TestIndexNodeRebalanceOut test during the run http://ci2i-unstable.northscale.in/gsi-10.12.2019-20.06.fail.html:
{code:java}2019-12-10T22:48:42.710+05:30 [Info] clustMgrAgent::OnIndexDelete Success for Drop IndexId 14723988844231693918 panic: runtime error: index out of range goroutine 125393 [running]: panic(0xf8a3e0, 0xc420018150) /home/buildbot/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 github.com/couchbase/indexing/secondary/indexer.(*StreamState).updateRepairState(0xc4202ba000, 0xc4237e0001, 0xc424e75618, 0x7, 0xc4238a8000, 0x1af, 0x200, 0x0, 0x0, 0x0) goproj/src/github.com/couchbase/indexing/secondary/indexer/stream_state.go:541 +0x3a5 github.com/couchbase/indexing/secondary/indexer.(*timekeeper).sendRestartMsg(0xc420136080, 0x1a9cde0, 0xc427d5a150) goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3103 +0x22ca created by github.com/couchbase/indexing/secondary/indexer.(*timekeeper).repairStream goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3016 +0xf27 {code} This issue seems to have been fixed under |
Description |
Indexer crashes with the following stack trace at the TestIndexNodeRebalanceOut test during the run http://ci2i-unstable.northscale.in/gsi-10.12.2019-20.06.fail.html:
{code:java}2019-12-10T22:48:42.710+05:30 [Info] clustMgrAgent::OnIndexDelete Success for Drop IndexId 14723988844231693918 panic: runtime error: index out of range goroutine 125393 [running]: panic(0xf8a3e0, 0xc420018150) /home/buildbot/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 github.com/couchbase/indexing/secondary/indexer.(*StreamState).updateRepairState(0xc4202ba000, 0xc4237e0001, 0xc424e75618, 0x7, 0xc4238a8000, 0x1af, 0x200, 0x0, 0x0, 0x0) goproj/src/github.com/couchbase/indexing/secondary/indexer/stream_state.go:541 +0x3a5 github.com/couchbase/indexing/secondary/indexer.(*timekeeper).sendRestartMsg(0xc420136080, 0x1a9cde0, 0xc427d5a150) goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3103 +0x22ca created by github.com/couchbase/indexing/secondary/indexer.(*timekeeper).repairStream goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3016 +0xf27 {code} This issue seems to have been fixed under |
Indexer crashes with the following stack trace at the TestIndexNodeRebalanceOut test during the run [http://ci2i-unstable.northscale.in/gsi-10.12.2019-20.06.fail.html|http://ci2i-unstable.northscale.in/gsi-10.12.2019-20.06.fail.html:]
{code:java}2019-12-10T22:48:42.710+05:30 [Info] clustMgrAgent::OnIndexDelete Success for Drop IndexId 14723988844231693918 panic: runtime error: index out of range goroutine 125393 [running]: panic(0xf8a3e0, 0xc420018150) /home/buildbot/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 github.com/couchbase/indexing/secondary/indexer.(*StreamState).updateRepairState(0xc4202ba000, 0xc4237e0001, 0xc424e75618, 0x7, 0xc4238a8000, 0x1af, 0x200, 0x0, 0x0, 0x0) goproj/src/github.com/couchbase/indexing/secondary/indexer/stream_state.go:541 +0x3a5 github.com/couchbase/indexing/secondary/indexer.(*timekeeper).sendRestartMsg(0xc420136080, 0x1a9cde0, 0xc427d5a150) goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3103 +0x22ca created by github.com/couchbase/indexing/secondary/indexer.(*timekeeper).repairStream goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3016 +0xf27 {code} This issue seems to have been fixed under |
Description |
Indexer crashes with the following stack trace at the TestIndexNodeRebalanceOut test during the run [http://ci2i-unstable.northscale.in/gsi-10.12.2019-20.06.fail.html|http://ci2i-unstable.northscale.in/gsi-10.12.2019-20.06.fail.html:]
{code:java}2019-12-10T22:48:42.710+05:30 [Info] clustMgrAgent::OnIndexDelete Success for Drop IndexId 14723988844231693918 panic: runtime error: index out of range goroutine 125393 [running]: panic(0xf8a3e0, 0xc420018150) /home/buildbot/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 github.com/couchbase/indexing/secondary/indexer.(*StreamState).updateRepairState(0xc4202ba000, 0xc4237e0001, 0xc424e75618, 0x7, 0xc4238a8000, 0x1af, 0x200, 0x0, 0x0, 0x0) goproj/src/github.com/couchbase/indexing/secondary/indexer/stream_state.go:541 +0x3a5 github.com/couchbase/indexing/secondary/indexer.(*timekeeper).sendRestartMsg(0xc420136080, 0x1a9cde0, 0xc427d5a150) goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3103 +0x22ca created by github.com/couchbase/indexing/secondary/indexer.(*timekeeper).repairStream goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3016 +0xf27 {code} This issue seems to have been fixed under |
Indexer crashes with the following stack trace at the TestIndexNodeRebalanceOut test during the run http://ci2i-unstable.northscale.in/gsi-10.12.2019-20.06.fail.html
{code:java}2019-12-10T22:48:42.710+05:30 [Info] clustMgrAgent::OnIndexDelete Success for Drop IndexId 14723988844231693918 panic: runtime error: index out of range goroutine 125393 [running]: panic(0xf8a3e0, 0xc420018150) /home/buildbot/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 github.com/couchbase/indexing/secondary/indexer.(*StreamState).updateRepairState(0xc4202ba000, 0xc4237e0001, 0xc424e75618, 0x7, 0xc4238a8000, 0x1af, 0x200, 0x0, 0x0, 0x0) goproj/src/github.com/couchbase/indexing/secondary/indexer/stream_state.go:541 +0x3a5 github.com/couchbase/indexing/secondary/indexer.(*timekeeper).sendRestartMsg(0xc420136080, 0x1a9cde0, 0xc427d5a150) goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3103 +0x22ca created by github.com/couchbase/indexing/secondary/indexer.(*timekeeper).repairStream goproj/src/github.com/couchbase/indexing/secondary/indexer/timekeeper.go:3016 +0xf27 {code} This issue seems to have been fixed under |
Attachment | logs-functional-ns_server.tar.gz [ 79037 ] |
Attachment | logs-functional-ns_server.tar.gz [ 79037 ] |
Attachment | gsi-10.12.2019-20.06.fail.html [ 79041 ] |
Attachment | logs-functional-ns_server.tar.gz [ 79042 ] |
Fix Version/s | 6.5.1 [ 16622 ] |
Labels | 6.5.1-candidate |
Fix Version/s | Mad-Hatter [ 15037 ] | |
Fix Version/s | Cheshire-Cat [ 15915 ] | |
Fix Version/s | 6.5.1 [ 16622 ] |
Labels | 6.5.1-candidate |
Priority | Major [ 3 ] | Critical [ 2 ] |
Labels | approved-for-mad-hatter |
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
Labels | approved-for-mad-hatter | approved-for-mad-hatter request-dev-verify |
Status | Resolved [ 5 ] | Closed [ 6 ] |
The issue is due to a race condition between bucket clean-up and stream repair. If bucket clean-up happens while stream repair is in progress, it will clean-up all the book keeping related to the bucket in stream-state. When repair stream code path tries to access the book-keeping, it results in a panic.
The issue could be reproduced using the following steps:
a. Add a sleep of 30 seconds in sendRestartMsg, at KV_SENDER_RESTART_VBUCKETS_RESPONSE after needsRollback call
b. Add a sleep in indexer for 30 seconds at removeIndexesFromStream after sending the message to timekeeper
c. Cluster run with 1KV+n1ql, 1KV+index, 1 index node
d. Create and build an index on one indexer ndoe
e. After the index is created, remove the indexer node on which the index was built. Trigger rebalance. Rebalance will move the index to the other node and clean-up the bucket from stream
f. Rebalance should fail and indexer should panic
The for indexer panic is because, the stream status is validated only once while processing the KV_SENDER_RESTART_VBUCKETS_RESPONSE. After the status is validated and if stream clean-up happens, updateRepairState method would panic as the stream is cleaned up. We are locking around the stream state variables twice but validating the status only once.