Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51892

The Backup Service or cbauth can get stuck in a state where it will not reconnect to ns_server

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      As seen in a recent situation in the field, the Backup Service can get stuck in some fashion. This results in ns_server dropping the connection due to a timeout and waiting for a reconnect from the Backup Service, which never happens. The JSON RPC connection process gets restarted and times out again and this continues for many hours.

      A message is seen like this:

      [ns_server:error,2022-04-19T17:02:50.904-07:00,ns_1@172.23.120.100:service_agent-backup<0.24218.3688>:service_agent:handle_info:277]Linked process <0.21373.3688> died with reason {no_connection,
                                                      "backup-service_api"}. Terminating
      

      And this continues 60 times an hour for many hours:

      $ grep "service_agent.*no_connection" ns_server.debug.log  | grep -E -o 2022-..-..T.. | uniq -c
        58 2022-04-19T17
        60 2022-04-19T18
        60 2022-04-19T19
        60 2022-04-19T20
        60 2022-04-19T21
        60 2022-04-19T22
        60 2022-04-19T23
        60 2022-04-20T00
        60 2022-04-20T01
        60 2022-04-20T02
        60 2022-04-20T03
        60 2022-04-20T04
        60 2022-04-20T05
        60 2022-04-20T06
        60 2022-04-20T07
        60 2022-04-20T08
        60 2022-04-20T09
        60 2022-04-20T10
         1 2022-04-20T11
      

      The code that re-establishes the connection with ns_server is here: https://github.com/couchbase/cbauth/blob/94cdd4fa943bb2107f48238bb563e5bc71b73df5/revrpc/revrpc.go#L282. But for some reason, when the connection drops something prevents the reconnection from happening.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              joe.mitchelljones Joe Mitchell Jones
              dfinlay Dave Finlay
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty