Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
4.0.0
-
Security Level: Public
-
Untriaged
-
No
Description
When xdcr replication is started, we call bucket.GetFailoverLogs() to get failover logs of the local bucket. This works fine in general. It often fails with the following error in the following scenario, though:
Start replication
Restart server. Replication gets auto-restarted.
The following error was seen in the log file: 2015/02/04 18:57:56 No Free connections for vblist [0 1 2 3 … 1023]
The error is timing related. If we sleep for 15 seconds before calling bucket.GetFailoverLogs(), the error did not occur.
I traced the problem to the call to conn.SelectBucket(ah.Bucket) in cbuth. According to Alk, the bucket may not have been ready when this call was made. We will need to detect this situation and handle the error more gracefully rather crashing.
More details:
I traced the problem to the following code in upr.go. The actual err is "err=MCResponse status=KEY_ENOENT, opcode=0x89, opaque=0, msg: Engine not found”.
func (b *Bucket) GetFailoverLogs(vBuckets []uint16) (FailoverLog, error) {
mc, err := serverConn.Get()
if err != nil
I further traced the err to AuthenticateMemcachedConn in conn_pool.go.
func defaultMkConn(host string, ah AuthHandler) (*memcached.Client, error) {
if gah, ok := ah.(GenericMcdAuthHandler); ok {
err = gah.AuthenticateMemcachedConn(host, conn)
Then to mcdauthhandler.go in cbauth. The err was returned from conn.SelectBucket() statement below:
func (ah *AuthHandler) AuthenticateMemcachedConn(host string, conn *memcached.Client) error {
return WithAuthenticator(ah.A, func(a Authenticator) error {
u, p, err := a.GetMemcachedServiceAuth(host)
if err != nil
{ return err }_, err = conn.Auth(u, p)
if err == nil && ah.Bucket != ""
{ _, err = conn.SelectBucket(ah.Bucket) }Some info that may be useful: u=_admin p=cd47ef687129213a88fb8ad7168fa33e, ah.Bucket=default