Node 172.23.96.148 was rebalanced in. Each of the consumer processes for Function timer_op tried to send a mutation and got stuck while trying to write to the socket -
3 @ 0x90f27a 0x90a59a 0x909c17 0x9a4bcb 0x9a4cbd 0x9a65b7 0xa6ed1f 0xa7fdba 0x96f256 0x1093977 0x1091769 0x10ad168 0x109d2a1 0x93cb11
|
# 0x909c16 internal/poll.runtime_pollWait+0x56 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.10.3/go/src/runtime/netpoll.go:173
|
# 0x9a4bca internal/poll.(*pollDesc).wait+0x9a /home/couchbase/.cbdepscache/exploded/x86_64/go-1.10.3/go/src/internal/poll/fd_poll_runtime.go:85
|
# 0x9a4cbc internal/poll.(*pollDesc).waitWrite+0x3c /home/couchbase/.cbdepscache/exploded/x86_64/go-1.10.3/go/src/internal/poll/fd_poll_runtime.go:94
|
# 0x9a65b6 internal/poll.(*FD).Write+0x236 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.10.3/go/src/internal/poll/fd_unix.go:264
|
# 0xa6ed1e net.(*netFD).Write+0x4e /home/couchbase/.cbdepscache/exploded/x86_64/go-1.10.3/go/src/net/fd_unix.go:220
|
# 0xa7fdb9 net.(*conn).Write+0x69 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.10.3/go/src/net/net.go:188
|
# 0x96f255 bytes.(*Buffer).WriteTo+0xb5 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.10.3/go/src/bytes/buffer.go:240
|
# 0x1093976 github.com/couchbase/eventing/consumer.(*Consumer).sendMessage+0xbb6 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/consumer/handle_messages.go:670
|
# 0x1091768 github.com/couchbase/eventing/consumer.(*Consumer).sendDcpEvent+0x5f8 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/consumer/handle_messages.go:455
|
# 0x10ad167 github.com/couchbase/eventing/consumer.(*Consumer).sendEvent+0x787 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/consumer/process_events.go:1169
|
# 0x109d2a0 github.com/couchbase/eventing/consumer.(*Consumer).processDCPEvents+0x5660 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/consumer/process_events.go:91
|
The rest of the go routines which tried to write to the socket were waiting to get the lock. For example, the stats updater routine is trying to get the lock -
3 @ 0x90f27a 0x90f32e 0x91fc14 0x91f92d 0x944ce8 0x945bed 0x1092ebf 0x1090e8f 0x10b91a8 0x93cb11
|
# 0x91f92c sync.runtime_SemacquireMutex+0x3c /home/couchbase/.cbdepscache/exploded/x86_64/go-1.10.3/go/src/runtime/sema.go:71
|
# 0x944ce7 sync.(*Mutex).Lock+0x107 /home/couchbase/.cbdepscache/exploded/x86_64/go-1.10.3/go/src/sync/mutex.go:134
|
# 0x945bec sync.(*RWMutex).Lock+0x2c /home/couchbase/.cbdepscache/exploded/x86_64/go-1.10.3/go/src/sync/rwmutex.go:93
|
# 0x1092ebe github.com/couchbase/eventing/consumer.(*Consumer).sendMessage+0xfe /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/consumer/handle_messages.go:631
|
# 0x1090e8e github.com/couchbase/eventing/consumer.(*Consumer).sendGetExecutionStats+0x1fe /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/consumer/handle_messages.go:384
|
# 0x10b91a7 github.com/couchbase/eventing/consumer.(*Consumer).updateWorkerStats+0x347 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/eventing/consumer/stats_updater.go:111
|
The fact that the write to socket is getting stuck seems to be an issue in golang - https://github.com/golang/go/issues/27752 . The solution suggested by them is to upgrade to golang version 1.11 - https://github.com/golang/go/issues/27752#issuecomment-424387032 . Eventing is using golang version 1.10.3.
Since the stats update messages aren't going through, it prevents the producer from trying to kill and respawn the consumer processes.
Ankit Prabhu Could you please look into this on priority?