Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49331

[CBM] Authentication error when backing up during a rebalance

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      What's the issue?
      I was running continuous backups during rebalances and hit an authentication error when bootstrapping (via gocbcore). The logs for the failed run, and scripts used during testing can be found attached to this issue. This issue appears to be somewhat reproducible.

      Setup
      4 nodes, 3 with KV
      3 buckets each with 64 vBuckets and 500,000 items

      Explanation
      It looks like this is due to some particular timing around the server add:

      1. Node 2 is removed and a rebalance is started
      2. A backup is started
      3. Node 2 is re-added
      4. Before node 2 has its authentication setup cbbackupmgr tries to connect to it and gets told the authentication has failed

      In the cbbackupmgr log we see:

      2022-03-21T15:40:12.519+00:00 (Gocbcore) Memdclient `10.112.230.102:11210/0xc0003822d0` Failed to perform auth against server (authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"3","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"a35989bb-5295-4788-f344-30710cc2b2ad","last_dispatched_to":"10.112.230.102:11210","last_dispatched_from":"10.112.230.1:49441","last_connection_id":"f626d043180cbb1d/ba9cb7f8cd46e69e"})
      2022-03-21T15:40:12.520+00:00 (Gocbcore) Pipeline Client 0xc00017f500 failed to bootstrap: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"3","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"a35989bb-5295-4788-f344-30710cc2b2ad","last_dispatched_to":"10.112.230.102:11210","last_dispatched_from":"10.112.230.1:49441","last_connection_id":"f626d043180cbb1d/ba9cb7f8cd46e69e"}
      2022-03-21T15:40:12.520+00:00 (Gocbcore) Pipeline Client `10.112.230.102:11210/0xc00017f500` preparing for new client loop
      2022-03-21T15:40:12.520+00:00 (Gocbcore) Pipeline Client `10.112.230.102:11210/0xc00017f500` retrieving new client connection for parent 0xc00017f380
      2022-03-21T15:40:12.564+00:00 (Gocbcore) Bootstrap error found before config seen: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"3","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"a35989bb-5295-4788-f344-30710cc2b2ad","last_dispatched_to":"10.112.230.102:11210","last_dispatched_from":"10.112.230.1:49441","last_connection_id":"f626d043180cbb1d/ba9cb7f8cd46e69e"}
      2022-03-21T15:40:12.564+00:00 (Gocbcore) Will retry request. Backoff=400ms, OperationID=waituntilready. Reason=CONNECTION_ERROR
      2022-03-21T15:40:12.964+00:00 (Gocbcore) Bootstrap error found before config seen: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"3","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"a35989bb-5295-4788-f344-30710cc2b2ad","last_dispatched_to":"10.112.230.102:11210","last_dispatched_from":"10.112.230.1:49441","last_connection_id":"f626d043180cbb1d/ba9cb7f8cd46e69e"}
      2022-03-21T15:40:12.965+00:00 (Gocbcore) Will retry request. Backoff=800ms, OperationID=waituntilready. Reason=CONNECTION_ERROR
      2022-03-21T15:40:13.765+00:00 (Gocbcore) Bootstrap error found before config seen: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"3","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"a35989bb-5295-4788-f344-30710cc2b2ad","last_dispatched_to":"10.112.230.102:11210","last_dispatched_from":"10.112.230.1:49441","last_connection_id":"f626d043180cbb1d/ba9cb7f8cd46e69e"}
      2022-03-21T15:40:13.765+00:00 (Gocbcore) Won't retry request.  OperationID=waituntilready. Reason=CONNECTION_ERROR
      

      And in the ns_server log we see:

      [user:info,2022-03-21T15:40:13.641Z,ns_1@10.112.230.102:memcached_config_mgr<0.6450.38>:memcached_config_mgr:hot_reload_config:323]Hot-reloaded memcached.json for config change of the following keys: [<<"scramsha_fallback_salt">>]
      

      which is logged when the config for authentication has been reloaded. Note that is happens after we attempt to connect which is why the authentication fails.

      One thing to note is that the logs say that we are retrying here but we are not actually retrying the bootstrapping. The pertinent part of the code is here where we try to check if KV is ready. If we get an error (like authentication) we sleep and go back to the start of the for loop. Note, however, that actually bootstrapping isn't done in this function or anywhere in its call tree, so bootstrapping is never retried! This can be confirmed in the logs as we would expect another "Failed to perform auth against server" if it had really failed again - see memdClient.Bootstrap

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              james.lee James Lee
              james.lee James Lee
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty