Details
-
Bug
-
Resolution: Not a Bug
-
Minor
-
None
-
core-10.1.1
-
None
-
1
Description
In cbbackupmgr we've hit an issue where retrying bootstrapping might have helped us not fail a backup.
The example is a bit complicated so I've written a program that reproduces it in a different way. We try to connect to a node with the incorrect password with retry enabled. We might expect the connection to be retried - and the logs suggest some retrying is being done - but it isn't:
package main
|
|
import (
|
"log"
|
"time"
|
"github.com/couchbase/gocbcore/v10"
|
)
|
|
type GocbcoreRetryStrategy struct {
|
calculator gocbcore.BackoffCalculator
|
retries uint32
|
}
|
func (g *GocbcoreRetryStrategy) RetryAfter(req gocbcore.RetryRequest, rea gocbcore.RetryReason) gocbcore.RetryAction {
|
if req.RetryAttempts() > g.retries+1 || !(req.Idempotent() || rea.AllowsNonIdempotentRetry()) {
|
return nil
|
}
|
|
return &gocbcore.WithDurationRetryAction{WithDuration: g.calculator(req.RetryAttempts())}
|
}
|
|
type CustomGOCBLogger struct{}
|
func (l *CustomGOCBLogger) Log(level gocbcore.LogLevel, offset int, format string, v ...interface{}) error {
|
log.Printf(format, v...)
|
return nil
|
}
|
|
func main() {
|
gocbcore.SetLogger(&CustomGOCBLogger{})
|
|
agent, err := gocbcore.CreateAgent(&gocbcore.AgentConfig{
|
BucketName: "default",
|
SecurityConfig: gocbcore.SecurityConfig{
|
Auth: gocbcore.PasswordAuthProvider{
|
Username: "Administrator",
|
Password: "ThisIsNotReallyThePassword",
|
},
|
},
|
SeedConfig: gocbcore.SeedConfig{
|
MemdAddrs: []string{"10.112.230.101:11210"},
|
},
|
})
|
if err != nil {
|
log.Fatalf("%s", err)
|
}
|
|
ch := make(chan bool, 1)
|
_, err = agent.WaitUntilReady(time.Now().Add(time.Second*5), gocbcore.WaitUntilReadyOptions{
|
RetryStrategy: &GocbcoreRetryStrategy{
|
calculator: gocbcore.ExponentialBackoff(50 * time.Millisecond, 2*time.Second + 500*time.Millisecond, 2.0),
|
retries: uint32(3),
|
},
|
}, func(wurr *gocbcore.WaitUntilReadyResult, e error) {
|
log.Printf("%+v %s\n", wurr, e)
|
ch <- true
|
})
|
if err != nil {
|
log.Fatalf("%s", err)
|
}
|
|
<-ch
|
}
|
The logs show:
2022/03/22 11:17:23 Memdclient `10.112.230.101:11210/0xc000134000` Failed to perform auth against server (authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"default","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"caf8310e-a6ae-48ef-f64e-8e828fd52b82","last_dispatched_to":"10.112.230.101:11210","last_dispatched_from":"10.112.230.1:52962","last_connection_id":"b52f75065d37aeb1/e8faec5d1b9c167a"})
|
2022/03/22 11:17:23 Pipeline Client 0xc0000b0540 failed to bootstrap: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"default","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"caf8310e-a6ae-48ef-f64e-8e828fd52b82","last_dispatched_to":"10.112.230.101:11210","last_dispatched_from":"10.112.230.1:52962","last_connection_id":"b52f75065d37aeb1/e8faec5d1b9c167a"}
|
2022/03/22 11:17:23 Pipeline Client `10.112.230.101:11210/0xc0000b0540` preparing for new client loop
|
2022/03/22 11:17:23 Pipeline Client `10.112.230.101:11210/0xc0000b0540` retrieving new client connection for parent 0xc0000b0480
|
2022/03/22 11:17:23 Bootstrap error found before config seen: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"default","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"caf8310e-a6ae-48ef-f64e-8e828fd52b82","last_dispatched_to":"10.112.230.101:11210","last_dispatched_from":"10.112.230.1:52962","last_connection_id":"b52f75065d37aeb1/e8faec5d1b9c167a"}
|
2022/03/22 11:17:23 Will retry request. Backoff=200ms, OperationID=waituntilready. Reason=CONNECTION_ERROR
|
2022/03/22 11:17:23 Bootstrap error found before config seen: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"default","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"caf8310e-a6ae-48ef-f64e-8e828fd52b82","last_dispatched_to":"10.112.230.101:11210","last_dispatched_from":"10.112.230.1:52962","last_connection_id":"b52f75065d37aeb1/e8faec5d1b9c167a"}
|
2022/03/22 11:17:23 Will retry request. Backoff=400ms, OperationID=waituntilready. Reason=CONNECTION_ERROR
|
2022/03/22 11:17:23 Bootstrap error found before config seen: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"default","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"caf8310e-a6ae-48ef-f64e-8e828fd52b82","last_dispatched_to":"10.112.230.101:11210","last_dispatched_from":"10.112.230.1:52962","last_connection_id":"b52f75065d37aeb1/e8faec5d1b9c167a"}
|
2022/03/22 11:17:23 Will retry request. Backoff=800ms, OperationID=waituntilready. Reason=CONNECTION_ERROR
|
2022/03/22 11:17:24 Bootstrap error found before config seen: authentication failure | {"status_code":32,"document_key":"SCRAM-SHA512","bucket":"default","error_name":"AUTH_ERROR","error_description":"Authentication failed","opaque":5,"ref":"caf8310e-a6ae-48ef-f64e-8e828fd52b82","last_dispatched_to":"10.112.230.101:11210","last_dispatched_from":"10.112.230.1:52962","last_connection_id":"b52f75065d37aeb1/e8faec5d1b9c167a"}
|
Although there are logs saying retrying is happening we're not actually retrying the bootstrapping which is shown by the fact that we only see "Failed to perform auth against server" line. The pertinent part of the code is here where we try to check if KV is ready. If we get an error (like authentication) we sleep and go back to the start of the for loop. Note, however, that actually bootstrapping isn't done in this function or anywhere in its call tree, so bootstrapping is never retried
I'm not sure whether this is a bug - it seems reasonable to not retry because of an auth failure - but would like to make sure as the logging is a little confusing.
Attachments
Issue Links
- relates to
-
MB-49331 [CBM] Authentication error when backing up during a rebalance
- Resolved
-
GOCBC-1264 Add more documentation to Options blocks
- Resolved