Details
-
Bug
-
Resolution: Unresolved
-
Major
-
7.2.4
-
Untriaged
-
0
-
Unknown
Description
While investigating an issue with cloud-native-gateway, we discovered that revrpc appears to stop responding with heartbeats, and future connections to revrpc result in the initial updateDB rpc not being invoked.
A cbcollect is attached to the issue.
The original CNG issue is here: https://couchbasecloud.atlassian.net/browse/ING-780
The logs from cbauthx (cng's cbauth implementation) are also included below, where you can see that we are receiving heartbeats regularly, and then they stop resulting in CNG assuming something has gone with the connection. We then attempt to reconnect to revrpc, but don't receive the initial updateDBExt within 5 seconds, and assume the new connection is faulty as well.
{"level":"debug","ts":"2024-04-23T10:58:56.808Z","logger":"gateway.cbauth","caller":"cbauthx/cbauthclient.go:195","msg":"received heartbeat rpc","clientId":"116fc4fa","opts":{}}
|
{"level":"debug","ts":"2024-04-23T10:59:01.809Z","logger":"gateway.cbauth","caller":"cbauthx/cbauthclient.go:195","msg":"received heartbeat rpc","clientId":"116fc4fa","opts":{}}
|
{"level":"debug","ts":"2024-04-23T10:59:06.810Z","logger":"gateway.cbauth","caller":"cbauthx/cbauthclient.go:195","msg":"received heartbeat rpc","clientId":"116fc4fa","opts":{}}
|
{"level":"debug","ts":"2024-04-23T10:59:21.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauthclient.go:264","msg":"internal close triggered","clientId":"116fc4fa","error":"cache is stale"}
|
{"level":"warn","ts":"2024-04-23T10:59:21.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:399","msg":"lost connection to cbauth","error":"cache is stale"}
|
{"level":"info","ts":"2024-04-23T10:59:21.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:324","msg":"new cbauth client connecting","endpoints":["http://cb-oc-0002.cb-oc.fit-testing-situational-33f80b-e7848e-2024-04-23.svc:8091"],"clusterUuid":""}
|
{"level":"debug","ts":"2024-04-23T10:59:21.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:215","msg":"attempting to build new cbauth client","endpoint":"http://cb-oc-0002.cb-oc.fit-testing-situational-33f80b-e7848e-2024-04-23.svc:8091"}
|
{"level":"warn","ts":"2024-04-23T10:59:26.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:239","msg":"failed to build new cbauth client","error":"failed to connect to revrpc: context cancelled while peeking response: context deadline exceeded"}
|
{"level":"warn","ts":"2024-04-23T10:59:26.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:247","msg":"failed to connect to all cbauth endpoints..."}
|
{"level":"warn","ts":"2024-04-23T10:59:26.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:335","msg":"failed to reconnect to cbauth","error":"failed to connect to all hosts: failed to connect to revrpc: context cancelled while peeking response: context deadline exceeded"}
|
Attachments
Gerrit Reviews
For Gerrit Dashboard: MB-61619 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
209891,1 | MB-61619 do not block updates to other cbauth clients if one of the clients | trinity | ns_server | Status: NEW | 0 | +1 |
209892,1 | MB-61619 wait for sync from cbauth on multiple nodes in parallel | trinity | ns_server | Status: NEW | +2 | +1 |
209990,1 | MB-61619 do not wait until keep alive timeout while performing | trinity | ns_server | Status: NEW | +2 | +1 |