Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
5.5.0
-
Untriaged
-
Centos 64-bit
-
Unknown
Description
Noticed following panic on one of the Eventing node on longevity system test cluster:
panic: Unable to find given hostport in cbauth database: `172.23.96.210:11210'
|
|
goroutine 213266 [running]:
|
panic(0xc5e760, 0xc43115b3c0)
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 fp=0xc431188ac0 sp=0xc431188a30
|
github.com/couchbase/eventing/util.(*CbAuthHandler).AuthenticateMemcachedConn(0xc4262bd5c0, 0xc4322fb4a0, 0x13, 0xc422c2e8d0, 0xc422c2e801, 0x0)
|
goproj/src/github.com/couchbase/eventing/util/kv.go:63 +0x22b fp=0xc431188b50 sp=0xc431188ac0
|
github.com/couchbase/eventing/dcp.defaultMkConn(0xc4322fb4a0, 0x13, 0x153b460, 0xc4262bd5c0, 0x10000c4281f2000, 0x0, 0x7b)
|
goproj/src/github.com/couchbase/eventing/dcp/conn_pool.go:60 +0x110 fp=0xc431188be8 sp=0xc431188b50
|
github.com/couchbase/eventing/dcp.(*connectionPool).GetWithTimeout(0xc42d467940, 0x9356907420000, 0x0, 0x0, 0x0)
|
goproj/src/github.com/couchbase/eventing/dcp/conn_pool.go:139 +0x555 fp=0xc431188dd0 sp=0xc431188be8
|
github.com/couchbase/eventing/dcp.(*connectionPool).Get(0xc42d467940, 0xc42984abd0, 0x64, 0xd8df3b)
|
goproj/src/github.com/couchbase/eventing/dcp/conn_pool.go:152 +0x37 fp=0xc431188e08 sp=0xc431188dd0
|
github.com/couchbase/eventing/dcp.(*connectionPool).StartDcpFeed(0xc42d467940, 0xc4272d5a80, 0x7b, 0x400000000, 0xc4202383c0, 0xc42984abcd, 0xc42707a210, 0x9, 0x0, 0x0)
|
goproj/src/github.com/couchbase/eventing/dcp/conn_pool.go:216 +0x38 fp=0xc431188e80 sp=0xc431188e08
|
github.com/couchbase/eventing/dcp.(*DcpFeed).connectToNodes(0xc423d2b710, 0xc42849afb0, 0x1, 0x1, 0x421e4abcd, 0xc42707a210, 0x67, 0x8)
|
goproj/src/github.com/couchbase/eventing/dcp/upr.go:368 +0x5f2 fp=0xc4311892c0 sp=0xc431188e80
|
github.com/couchbase/eventing/dcp.(*Bucket).StartDcpFeedOver(0xc429ff0780, 0xc42ff25740, 0x60, 0x400000000, 0xc42849afb0, 0x1, 0x1, 0xabcd, 0xc42707a210, 0x0, ...)
|
goproj/src/github.com/couchbase/eventing/dcp/upr.go:204 +0x54b fp=0xc431189398 sp=0xc4311892c0
|
github.com/couchbase/eventing/consumer.glob..func17(0xc421ddce40, 0x3, 0x3, 0x1, 0x1)
|
goproj/src/github.com/couchbase/eventing/consumer/bucket_ops.go:516 +0x1fd fp=0xc4311895a8 sp=0xc431189398
|
github.com/couchbase/eventing/util.Retry(0x1543a60, 0xc4289fb760, 0xc4257660d8, 0xe3aba8, 0xc421ddce40, 0x3, 0x3, 0xd88ed7, 0x1)
|
goproj/src/github.com/couchbase/eventing/util/retry.go:65 +0x63 fp=0xc431189618 sp=0xc4311895a8
|
github.com/couchbase/eventing/consumer.(*Consumer).dcpRequestStreamHandle(0xc428389400, 0x100, 0xc42d6b7980, 0x26baa, 0xc42685e930, 0x7)
|
goproj/src/github.com/couchbase/eventing/consumer/process_events.go:901 +0x237a fp=0xc431189e20 sp=0xc431189618
|
github.com/couchbase/eventing/consumer.(*Consumer).processReqStreamMessages.func1(0xc432926b60, 0xc428389400, 0xda2eb6, 0x22)
|
goproj/src/github.com/couchbase/eventing/consumer/process_events.go:1187 +0x65 fp=0xc431189f70 sp=0xc431189e20
|
runtime.goexit()
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc431189f78 sp=0xc431189f70
|
created by github.com/couchbase/eventing/consumer.(*Consumer).processReqStreamMessages
|
goproj/src/github.com/couchbase/eventing/consumer/process_events.go:1204 +0xc8f
|
Flow of events from Eventing side that lead to this panic:
- Eventing tried to issue stream for one of the vbuckets
- It requested vbmap from ns_server to figure KV node that has active copy of the vbucket
- Eventing issued to open a dcp connection to that KV node(172.23.96.210) and that reported node not found in cbauth db
Could this be because vbmap is out-of-sync?
2018-06-16T15:47:47-07:00 - time panic occurred
2018-06-16T15:11:57-07:00 - rebalance was kicked off
b/w 2018-06-16T15:09:09-07:00 & 2018-06-16T15:11:36-07:00 - cb was stopped on 3 kv nodes after enabling auto-failover.
=====
Additional details:
jenkins job link - http://172.23.109.231/job/centos-systest-launcher-2/112/consoleFull
Prior to this panic - couple of requests to failover nodes returned 503 error
[2018-06-16T13:14:38-07:00, sequoiatools/couchbase-cli:2b528b] failover -c 172.23.96.206:8091 --server-failover 172.23.96.210:8091 -u Administrator -p password
|
→
|
|
Error occurred on container - sequoiatools/couchbase-cli:[failover -c 172.23.96.206:8091 --server-failover 172.23.96.210:8091 -u Administrator -p password]
|
|
docker logs 2b528b
|
docker start 2b528b
|
|
&ERROR: Received unexpected status 503
|
[pull] sequoiatools/couchbase-cli
|
[2018-06-16T13:17:07-07:00, sequoiatools/couchbase-cli:359ee9] failover -c 172.23.96.206:8091 --server-failover 172.23.96.212:8091 -u Administrator -p password --force
|
→
|
|
Error occurred on container - sequoiatools/couchbase-cli:[failover -c 172.23.96.206:8091 --server-failover 172.23.96.212:8091 -u Administrator -p password --force]
|
|
docker logs 359ee9
|
docker start 359ee9
|
|
&ERROR: Received unexpected status 503
|
Then test enabled autofailover and stopped cb service on 3 kv nodes including 172.23.96.210 and then rebalance was kicked off, where Eventing panic-ed.