Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.1.0
-
Untriaged
-
1
-
Unknown
Description
What's the issue?
I was running continuous backups during rebalances and hit an authentication error when bootstrapping (via gocbcore). The logs for the failed run, and scripts used during testing can be found attached to this issue. This issue appears to be somewhat reproducible.
Setup
4 nodes, 3 with KV
3 buckets each with 64 vBuckets and 500,000 items
Explanation
It looks like this is due to some particular timing around the server add:
- Node 2 is removed and a rebalance is started
- A backup is started
- Node 2 is re-added
- Before node 2 has its authentication setup cbbackupmgr tries to connect to it and gets told the authentication has failed
In the cbbackupmgr log we see:
2022-03-21T15:40:12.519+00:00 (Gocbcore) Memdclient `10.112.230.102:11210/0xc0003822d0` Failed to perform auth against server (authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"3","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"a35989bb-5295-4788-f344-30710cc2b2ad","last_dispatched_to":"10.112.230.102:11210","last_dispatched_from":"10.112.230.1:49441","last_connection_id":"f626d043180cbb1d/ba9cb7f8cd46e69e"})
|
2022-03-21T15:40:12.520+00:00 (Gocbcore) Pipeline Client 0xc00017f500 failed to bootstrap: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"3","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"a35989bb-5295-4788-f344-30710cc2b2ad","last_dispatched_to":"10.112.230.102:11210","last_dispatched_from":"10.112.230.1:49441","last_connection_id":"f626d043180cbb1d/ba9cb7f8cd46e69e"}
|
2022-03-21T15:40:12.520+00:00 (Gocbcore) Pipeline Client `10.112.230.102:11210/0xc00017f500` preparing for new client loop
|
2022-03-21T15:40:12.520+00:00 (Gocbcore) Pipeline Client `10.112.230.102:11210/0xc00017f500` retrieving new client connection for parent 0xc00017f380
|
2022-03-21T15:40:12.564+00:00 (Gocbcore) Bootstrap error found before config seen: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"3","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"a35989bb-5295-4788-f344-30710cc2b2ad","last_dispatched_to":"10.112.230.102:11210","last_dispatched_from":"10.112.230.1:49441","last_connection_id":"f626d043180cbb1d/ba9cb7f8cd46e69e"}
|
2022-03-21T15:40:12.564+00:00 (Gocbcore) Will retry request. Backoff=400ms, OperationID=waituntilready. Reason=CONNECTION_ERROR
|
2022-03-21T15:40:12.964+00:00 (Gocbcore) Bootstrap error found before config seen: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"3","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"a35989bb-5295-4788-f344-30710cc2b2ad","last_dispatched_to":"10.112.230.102:11210","last_dispatched_from":"10.112.230.1:49441","last_connection_id":"f626d043180cbb1d/ba9cb7f8cd46e69e"}
|
2022-03-21T15:40:12.965+00:00 (Gocbcore) Will retry request. Backoff=800ms, OperationID=waituntilready. Reason=CONNECTION_ERROR
|
2022-03-21T15:40:13.765+00:00 (Gocbcore) Bootstrap error found before config seen: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"3","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"a35989bb-5295-4788-f344-30710cc2b2ad","last_dispatched_to":"10.112.230.102:11210","last_dispatched_from":"10.112.230.1:49441","last_connection_id":"f626d043180cbb1d/ba9cb7f8cd46e69e"}
|
2022-03-21T15:40:13.765+00:00 (Gocbcore) Won't retry request. OperationID=waituntilready. Reason=CONNECTION_ERROR
|
And in the ns_server log we see:
[user:info,2022-03-21T15:40:13.641Z,ns_1@10.112.230.102:memcached_config_mgr<0.6450.38>:memcached_config_mgr:hot_reload_config:323]Hot-reloaded memcached.json for config change of the following keys: [<<"scramsha_fallback_salt">>]
|
which is logged when the config for authentication has been reloaded. Note that is happens after we attempt to connect which is why the authentication fails.
One thing to note is that the logs say that we are retrying here but we are not actually retrying the bootstrapping. The pertinent part of the code is here where we try to check if KV is ready. If we get an error (like authentication) we sleep and go back to the start of the for loop. Note, however, that actually bootstrapping isn't done in this function or anywhere in its call tree, so bootstrapping is never retried! This can be confirmed in the logs as we would expect another "Failed to perform auth against server" if it had really failed again - see memdClient.Bootstrap
Attachments
Issue Links
- relates to
-
GOCBC-1263 [GOCBCORE] Bootstrapping isn't retried if authentication fails
- Closed
-
MB-48856 [System Test][Backup Service] task hung for several hours
- Closed