Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-61619

revrpc stops responding after a rebalance

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • 7.6.2
    • 7.2.4
    • ns_server
    • None
    • Untriaged
    • 0
    • Unknown

    Description

      While investigating an issue with cloud-native-gateway, we discovered that revrpc appears to stop responding with heartbeats, and future connections to revrpc result in the initial updateDB rpc not being invoked.

      A cbcollect is attached to the issue.

      The original CNG issue is here: https://couchbasecloud.atlassian.net/browse/ING-780

      The logs from cbauthx (cng's cbauth implementation) are also included below, where you can see that we are receiving heartbeats regularly, and then they stop resulting in CNG assuming something has gone with the connection. We then attempt to reconnect to revrpc, but don't receive the initial updateDBExt within 5 seconds, and assume the new connection is faulty as well.

      {"level":"debug","ts":"2024-04-23T10:58:56.808Z","logger":"gateway.cbauth","caller":"cbauthx/cbauthclient.go:195","msg":"received heartbeat rpc","clientId":"116fc4fa","opts":{}}
      {"level":"debug","ts":"2024-04-23T10:59:01.809Z","logger":"gateway.cbauth","caller":"cbauthx/cbauthclient.go:195","msg":"received heartbeat rpc","clientId":"116fc4fa","opts":{}}
      {"level":"debug","ts":"2024-04-23T10:59:06.810Z","logger":"gateway.cbauth","caller":"cbauthx/cbauthclient.go:195","msg":"received heartbeat rpc","clientId":"116fc4fa","opts":{}}
      {"level":"debug","ts":"2024-04-23T10:59:21.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauthclient.go:264","msg":"internal close triggered","clientId":"116fc4fa","error":"cache is stale"}
      {"level":"warn","ts":"2024-04-23T10:59:21.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:399","msg":"lost connection to cbauth","error":"cache is stale"}
      {"level":"info","ts":"2024-04-23T10:59:21.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:324","msg":"new cbauth client connecting","endpoints":["http://cb-oc-0002.cb-oc.fit-testing-situational-33f80b-e7848e-2024-04-23.svc:8091"],"clusterUuid":""}
      {"level":"debug","ts":"2024-04-23T10:59:21.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:215","msg":"attempting to build new cbauth client","endpoint":"http://cb-oc-0002.cb-oc.fit-testing-situational-33f80b-e7848e-2024-04-23.svc:8091"}
      {"level":"warn","ts":"2024-04-23T10:59:26.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:239","msg":"failed to build new cbauth client","error":"failed to connect to revrpc: context cancelled while peeking response: context deadline exceeded"}
      {"level":"warn","ts":"2024-04-23T10:59:26.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:247","msg":"failed to connect to all cbauth endpoints..."}
      {"level":"warn","ts":"2024-04-23T10:59:26.811Z","logger":"gateway.cbauth","caller":"cbauthx/cbauth.go:335","msg":"failed to reconnect to cbauth","error":"failed to connect to all hosts: failed to connect to revrpc: context cancelled while peeking response: context deadline exceeded"}
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              artem Artem Stemkovski
              brett19 Brett Lawson
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty