Uploaded image for project: 'Couchbase Go SDK'
  1. Couchbase Go SDK
  2. GOCBC-868

[gocbcore.v9] If bucket isn't available CreateAgent needs to fail immediately

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • 2.1.5
    • library
    • 1

    Description

      When the couchbase bucket isn't available, CreateAgent should fail right away with either ErrBucketNotFound (or for some reason ErrAuthenticationFailure like before) as opposed to clients having to use WaitUntilReady(..) and receive an "unambiguous timeout" error.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Hi Abhinav Dangeti can you give me a bit of detail on why you need this specific error at connect time? This is likely going to go to wider SDK team discussion so more info would be useful to aid in that discussion.

            charles.dixon Charles Dixon added a comment - Hi Abhinav Dangeti can you give me a bit of detail on why you need this specific error at connect time? This is likely going to go to wider SDK team discussion so more info would be useful to aid in that discussion.

            Hey Charles Dixon, here's why FTS needs this ..

            On bucket delete, FTS needs to drop all indexes associated with it. When a bucket is actually dropped, firstly all streams are closed with a socket closure message. On this at the FTS end, we close the feed and try setting up an agent to see if the bucket is still available and if so, is it's UUID still the same as previous (to cover a quick bucket recreation case). If the bucket doesn't exist or the UUID didn't match we go ahead and drop the index.

            Now if CreateAgent doesn't return immediately and just returns "unambiguous timeout" after WaitUntilReady() times out, we'd have to wait for a bit unnecessarily and also accommodate the "unambiguous timeout" error.

            I'm open to any alternative suggestions on how to determine if a bucket has been deleted too.

            abhinav Abhinav Dangeti added a comment - Hey Charles Dixon , here's why FTS needs this .. On bucket delete, FTS needs to drop all indexes associated with it. When a bucket is actually dropped, firstly all streams are closed with a socket closure message. On this at the FTS end, we close the feed and try setting up an agent to see if the bucket is still available and if so, is it's UUID still the same as previous (to cover a quick bucket recreation case). If the bucket doesn't exist or the UUID didn't match we go ahead and drop the index. Now if CreateAgent doesn't return immediately and just returns "unambiguous timeout" after WaitUntilReady() times out, we'd have to wait for a bit unnecessarily and also accommodate the "unambiguous timeout" error. I'm open to any alternative suggestions on how to determine if a bucket has been deleted too.

            Hi Abhinav Dangeti that makes sense. I think that immediately returning that error from CreateAgent won't be possible due to how gocbcore is now setup to work. We might be able to add additional context to the error returned by WaitUntilReady but that doesn't solve the unnecessary wait issue.

             

            An alternative to creating a new agent (assuming that you still have a standard agent available for other operations) could be to use the REST API - https://github.com/couchbase/gocbcore/blob/master/agent_ops.go#L233. You could make a request similar to what we expose via gocb - https://github.com/couchbase/gocb/blob/master/cluster_bucketmgr.go#L200

            charles.dixon Charles Dixon added a comment - Hi Abhinav Dangeti that makes sense. I think that immediately returning that error from CreateAgent won't be possible due to how gocbcore is now setup to work. We might be able to add additional context to the error returned by WaitUntilReady but that doesn't solve the unnecessary wait issue.   An alternative to creating a new agent (assuming that you still have a standard agent available for other operations) could be to use the REST API - https://github.com/couchbase/gocbcore/blob/master/agent_ops.go#L233 . You could make a request similar to what we expose via gocb - https://github.com/couchbase/gocb/blob/master/cluster_bucketmgr.go#L200

            Yea using DoHTTPRequest with a pendingOp is something I've tried already - but I'm not convinced that it's a good solution because - it still relies on the timeout I set for it, and would return ErrTimeout rather than a proper ErrBucketNotFound error.

            abhinav Abhinav Dangeti added a comment - Yea using DoHTTPRequest with a pendingOp is something I've tried already - but I'm not convinced that it's a good solution because - it still relies on the timeout I set for it, and would return ErrTimeout rather than a proper ErrBucketNotFound error.

            Charles Dixon Abhinav Dangeti any updates on this issue ?

            mihir.kamdar Mihir Kamdar (Inactive) added a comment - Charles Dixon Abhinav Dangeti any updates on this issue ?
            brett19 Brett Lawson added a comment -

            Hey Mihir Kamdar,
            We are currently working on a solution to this problem. We should have some more information for you by the end of the week.
            Cheers, Brett

            brett19 Brett Lawson added a comment - Hey Mihir Kamdar , We are currently working on a solution to this problem. We should have some more information for you by the end of the week. Cheers, Brett

            Hi Brett Lawson any updates on this ? This is causing a lot of FTS tests to fail.

            mihir.kamdar Mihir Kamdar (Inactive) added a comment - Hi Brett Lawson any updates on this ? This is causing a lot of FTS tests to fail.
            abhinav Abhinav Dangeti added a comment - - edited

            Hey Mihir Kamdar, I've pushed up a fix to FTS:

            http://review.couchbase.org/c/cbgt/+/128021

            This essentially lets FTS get the required data directly from ns_server endpoints to handle bucket and collections deletions.

            abhinav Abhinav Dangeti added a comment - - edited Hey Mihir Kamdar , I've pushed up a fix to FTS: http://review.couchbase.org/c/cbgt/+/128021 This essentially lets FTS get the required data directly from ns_server endpoints to handle bucket and collections deletions.
            abhinav Abhinav Dangeti added a comment - - edited

            Mihir Kamdar I've merged the above change. That should serve as an interim fix until SDK adds support for this.

            Reducing the priority of this bug.

            abhinav Abhinav Dangeti added a comment - - edited Mihir Kamdar  I've merged the above change. That should serve as an interim fix until SDK adds support for this. Reducing the priority of this bug.

            Build couchbase-server-7.0.0-2070 contains cbgt commit fcc6aaa with commit message:
            GOCBC-868: [feed_dcp_gocbcore] Handling bucket deletions

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2070 contains cbgt commit fcc6aaa with commit message: GOCBC-868 : [feed_dcp_gocbcore] Handling bucket deletions

            Hey Brett Lawson, been a while since we checked in on this - d'you have any updates or an ETA for this yet?

            We've some issues within FTS which would get resolved once we reach a resolution for this. 

            abhinav Abhinav Dangeti added a comment - Hey Brett Lawson , been a while since we checked in on this - d'you have any updates or an ETA for this yet? We've some issues within FTS which would get resolved once we reach a resolution for this. 

            I'm moving this to 2.1.4 but if the latest behaviour change we made (that I discussed with Abhinav Dangeti) solves this then we can resolve and move it back to 2.1.3.

            charles.dixon Charles Dixon added a comment - I'm moving this to 2.1.4 but if the latest behaviour change we made (that I discussed with Abhinav Dangeti ) solves this then we can resolve and move it back to 2.1.3.

            Build couchbase-server-7.0.0-2505 contains gocbcore commit 2d1ed35 with commit message:
            GOCBC-868: Expose a way to fast fail WaitUntilReady

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2505 contains gocbcore commit 2d1ed35 with commit message: GOCBC-868 : Expose a way to fast fail WaitUntilReady

            Build couchbase-server-6.6.0-7858 contains gocbcore commit 2d1ed35 with commit message:
            GOCBC-868: Expose a way to fast fail WaitUntilReady

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7858 contains gocbcore commit 2d1ed35 with commit message: GOCBC-868 : Expose a way to fast fail WaitUntilReady

            gocbcore.Agent's WaitUntilReady doesn't fail quickly in the case of the bucket being not available when the client attempts to connect over HTTP ..

            [09:43:44] AD: ~/Documents/go/src/github.com/abhinavdangeti/tmp $ go run simple_agent.go
            --> FromConnStr, err: <nil>
            2020/07/08 09:43:45 (GOCBCORE) SDK Version: gocbcore/v9.0.3
            2020/07/08 09:43:45 (GOCBCORE) Creating new agent: &{MemdAddrs:[] HTTPAddrs:[127.0.0.1:9000] BucketName:default UserAgent: UseTLS:false NetworkType: Auth:0x18a7688 TLSRootCAProvider:<nil> UseMutationTokens:false UseCompression:false UseDurations:false DisableDecompression:false UseOutOfOrderResponses:false UseCollections:false CompressionMinSize:0 CompressionMinRatio:0 HTTPRedialPeriod:0s HTTPRetryDelay:0s CccpMaxWait:0s CccpPollPeriod:0s ConnectTimeout:6s KVConnectTimeout:0s KvPoolSize:0 MaxQueueSize:0 HTTPMaxIdleConns:0 HTTPMaxIdleConnsPerHost:0 HTTPIdleConnectionTimeout:0s Tracer:<nil> NoRootTraceSpans:false DefaultRetryStrategy:<nil> CircuitBreakerConfig:{Enabled:false VolumeThreshold:0 ErrorThresholdPercentage:0 SleepWindow:0s RollingWindow:0s CompletionCallback:<nil> CanaryTimeout:0s} UseZombieLogger:false ZombieLoggerInterval:0s ZombieLoggerSampleSize:0}
            --> CreateAgent, err: <nil>
            --> WaitUntilReady, err:  <nil>
            2020/07/08 09:43:45 (GOCBCORE) Will retry request. Backoff=1ms, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:45 (GOCBCORE) CCCP Looper starting.
            2020/07/08 09:43:45 (GOCBCORE) CCCPPOLL: No nodes available to poll, return upstream
            2020/07/08 09:43:45 (GOCBCORE) HTTP Looper starting.
            2020/07/08 09:43:45 (GOCBCORE) Http Picked: http://127.0.0.1:9000.
            2020/07/08 09:43:45 (GOCBCORE) HTTP Hostname: 127.0.0.1.
            2020/07/08 09:43:45 (GOCBCORE) Requesting config from: http://127.0.0.1:9000//pools/default/bs/default.
            2020/07/08 09:43:45 (GOCBCORE) Will retry request. Backoff=10ms, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:45 (GOCBCORE) Writing HTTP request to http://127.0.0.1:9000/pools/default/bs/default ID=
            2020/07/08 09:43:45 (GOCBCORE) Requesting config from: http://127.0.0.1:9000//pools/default/bucketsStreaming/default.
            2020/07/08 09:43:45 (GOCBCORE) Writing HTTP request to http://127.0.0.1:9000/pools/default/bucketsStreaming/default ID=
            2020/07/08 09:43:45 (GOCBCORE) Failed to connect to host, bad bucket.
            2020/07/08 09:43:45 (GOCBCORE) Pick Failed.
            2020/07/08 09:43:45 (GOCBCORE) Looper waiting...
            2020/07/08 09:43:45 (GOCBCORE) Will retry request. Backoff=50ms, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:45 (GOCBCORE) Will retry request. Backoff=100ms, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:45 (GOCBCORE) Will retry request. Backoff=500ms, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:46 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:47 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:48 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:49 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:50 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:51 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:52 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:53 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:54 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY
            2020/07/08 09:43:55 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY
            --> WaitUntilReady Callback, err:  unambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"unambiguous timeout"}},"OperationID":"WaitUntilReady","Opaque":"","TimeObserved":10001420431,"RetryReasons":["NOT_READY"],"RetryAttempts":15,"LastDispatchedTo":"","LastDispatchedFrom":"","LastConnectionID":""} 

            abhinav Abhinav Dangeti added a comment - gocbcore.Agent's WaitUntilReady doesn't fail quickly in the case of the bucket being not available when the client attempts to connect over HTTP .. [ 09 : 43 : 44 ] AD: ~/Documents/go/src/github.com/abhinavdangeti/tmp $ go run simple_agent.go --> FromConnStr, err: <nil> 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) SDK Version: gocbcore/v9. 0.3 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Creating new agent: &{MemdAddrs:[] HTTPAddrs:[ 127.0 . 0.1 : 9000 ] BucketName: default UserAgent: UseTLS: false NetworkType: Auth: 0x18a7688 TLSRootCAProvider:<nil> UseMutationTokens: false UseCompression: false UseDurations: false DisableDecompression: false UseOutOfOrderResponses: false UseCollections: false CompressionMinSize: 0 CompressionMinRatio: 0 HTTPRedialPeriod:0s HTTPRetryDelay:0s CccpMaxWait:0s CccpPollPeriod:0s ConnectTimeout:6s KVConnectTimeout:0s KvPoolSize: 0 MaxQueueSize: 0 HTTPMaxIdleConns: 0 HTTPMaxIdleConnsPerHost: 0 HTTPIdleConnectionTimeout:0s Tracer:<nil> NoRootTraceSpans: false DefaultRetryStrategy:<nil> CircuitBreakerConfig:{Enabled: false VolumeThreshold: 0 ErrorThresholdPercentage: 0 SleepWindow:0s RollingWindow:0s CompletionCallback:<nil> CanaryTimeout:0s} UseZombieLogger: false ZombieLoggerInterval:0s ZombieLoggerSampleSize: 0 } --> CreateAgent, err: <nil> --> WaitUntilReady, err: <nil> 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Will retry request. Backoff=1ms, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) CCCP Looper starting. 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) CCCPPOLL: No nodes available to poll, return upstream 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) HTTP Looper starting. 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Http Picked: http: //127.0.0.1:9000. 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) HTTP Hostname: 127.0 . 0.1 . 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Requesting config from: http: //127.0.0.1:9000//pools/default/bs/default. 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Will retry request. Backoff=10ms, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Writing HTTP request to http: //127.0.0.1:9000/pools/default/bs/default ID= 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Requesting config from: http: //127.0.0.1:9000//pools/default/bucketsStreaming/default. 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Writing HTTP request to http: //127.0.0.1:9000/pools/default/bucketsStreaming/default ID= 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Failed to connect to host, bad bucket. 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Pick Failed. 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Looper waiting... 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Will retry request. Backoff=50ms, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Will retry request. Backoff=100ms, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 45 (GOCBCORE) Will retry request. Backoff=500ms, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 46 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 47 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 48 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 49 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 50 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 51 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 52 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 53 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 54 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY 2020 / 07 / 08 09 : 43 : 55 (GOCBCORE) Will retry request. Backoff=1s, OperationID=waituntilready. Reason=NOT_READY --> WaitUntilReady Callback, err: unambiguous timeout | { "InnerError" :{ "InnerError" :{ "InnerError" :{}, "Message" : "unambiguous timeout" }}, "OperationID" : "WaitUntilReady" , "Opaque" : "" , "TimeObserved" : 10001420431 , "RetryReasons" :[ "NOT_READY" ], "RetryAttempts" : 15 , "LastDispatchedTo" : "" , "LastDispatchedFrom" : "" , "LastConnectionID" : "" }
            abhinav Abhinav Dangeti added a comment - - edited http://review.couchbase.org/c/gocbcore/+/132235 http://review.couchbase.org/c/cbgt/+/132161
            abhinav Abhinav Dangeti added a comment - - edited

            Hey Charles Dixon, noticed a regression with the new change. I'm unable to set up an agent on restarting couchbase-server ..

            Agent setup fails with this error message - "document not found".

            This is the config I'm using to set up the agent ..

            &gocbcore.AgentConfig{
            	MemdAddrs:[]string(nil),
            	HTTPAddrs:[]string{"127.0.0.1:9000"},
            	BucketName:"beer-sample",
            	UserAgent:"beer_1b2fb08b470d27e2_4c1c5584",
            	UseTLS:false,
            	NetworkType:"", Auth:(*cbgt.CBAuthenticator)(0x2887ae0),
            	TLSRootCAProvider:(func() *x509.CertPool)(nil),
            	UseMutationTokens:false,
            	UseCompression:false,
            	UseDurations:false,
            	DisableDecompression:false,
            	UseOutOfOrderResponses:false,
            	UseCollections:true,
            	CompressionMinSize:0,
            	CompressionMinRatio:0,
            	HTTPRedialPeriod:0,
            	HTTPRetryDelay:0,
            	CccpMaxWait:0,
            	CccpPollPeriod:0,
            	ConnectTimeout:60000000000,
            	KVConnectTimeout:7000000000,
            	KvPoolSize:0,
            	MaxQueueSize:0,
            	HTTPMaxIdleConns:0,
            	HTTPMaxIdleConnsPerHost:0,
            	HTTPIdleConnectionTimeout:0,
            	Tracer:gocbcore.RequestTracer(nil),
            	NoRootTraceSpans:false,
            	DefaultRetryStrategy:gocbcore.RetryStrategy(nil),
            	CircuitBreakerConfig:gocbcore.CircuitBreakerConfig{
            		Enabled:false,
            		VolumeThreshold:0,
            		ErrorThresholdPercentage:0,
            		SleepWindow:0,
            		RollingWindow:0,
            		CompletionCallback:(gocbcore.CircuitBreakerCallback)(nil),
            		CanaryTimeout:0
            	},
            	UseZombieLogger:false,
            	ZombieLoggerInterval:0,
            	ZombieLoggerSampleSize:0
            } 

            This error is new so I'm reverting the go mod update i've made to cbgt to point back to our last clean build:

            The regression introduced here causes MB-40505.

            Additional logging from within GOCBCORE ..

            2020/07/16 12:07:35 (GOCBCORE) Failed to perform select bucket against server (document not found | {"status_code":1,"bucket":"beer-sample","error_name":"KEY_ENOENT","error_description":"Not Found","opaque":6,"last_dispatched_to":"127.0.0.1:12000","last_dispatched_from":"127.0.0.1:58729","last_connection_id":"919c13c337816aba/96712d3a0387dc1e"})
            2020/07/16 12:07:35 (GOCBCORE) Pipeline Client `127.0.0.1:12000/0xc0001b0230` preparing for new client loop
            2020/07/16 12:07:35 (GOCBCORE) Pipeline Client `127.0.0.1:12000/0xc0001b0230` retrieving new client connection for parent 0xc0001b0190
            2020/07/16 12:07:35 (GOCBCORE) Won't retry request.  OperationID=waituntilready. Reason=CONNECTION_ERROR
            --> WaitUntilReady Callback, err:  document not found | {"status_code":1,"bucket":"beer-sample","error_name":"KEY_ENOENT","error_description":"Not Found","opaque":6,"last_dispatched_to":"127.0.0.1:12000","last_dispatched_from":"127.0.0.1:58729","last_connection_id":"919c13c337816aba/96712d3a0387dc1e"} 
            2020/07/16 12:07:35 (GOCBCORE) Pipeline Client `127.0.0.1:12000/0xc0001b0230` received close request
            2020/07/16 12:07:38 (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout
            2020/07/16 12:07:38 (GOCBCORE) CCCPPOLL: Failed to retrieve config from any node.

            abhinav Abhinav Dangeti added a comment - - edited Hey Charles Dixon , noticed a regression with the new change. I'm unable to set up an agent on restarting couchbase-server .. Agent setup fails with this error message - "document not found". This is the config I'm using to set up the agent .. &gocbcore.AgentConfig{ MemdAddrs:[]string(nil), HTTPAddrs:[]string{ "127.0.0.1:9000" }, BucketName: "beer-sample" , UserAgent: "beer_1b2fb08b470d27e2_4c1c5584" , UseTLS: false , NetworkType: "" , Auth:(*cbgt.CBAuthenticator)( 0x2887ae0 ), TLSRootCAProvider:(func() *x509.CertPool)(nil), UseMutationTokens: false , UseCompression: false , UseDurations: false , DisableDecompression: false , UseOutOfOrderResponses: false , UseCollections: true , CompressionMinSize: 0 , CompressionMinRatio: 0 , HTTPRedialPeriod: 0 , HTTPRetryDelay: 0 , CccpMaxWait: 0 , CccpPollPeriod: 0 , ConnectTimeout: 60000000000 , KVConnectTimeout: 7000000000 , KvPoolSize: 0 , MaxQueueSize: 0 , HTTPMaxIdleConns: 0 , HTTPMaxIdleConnsPerHost: 0 , HTTPIdleConnectionTimeout: 0 , Tracer:gocbcore.RequestTracer(nil), NoRootTraceSpans: false , DefaultRetryStrategy:gocbcore.RetryStrategy(nil), CircuitBreakerConfig:gocbcore.CircuitBreakerConfig{ Enabled: false , VolumeThreshold: 0 , ErrorThresholdPercentage: 0 , SleepWindow: 0 , RollingWindow: 0 , CompletionCallback:(gocbcore.CircuitBreakerCallback)(nil), CanaryTimeout: 0 }, UseZombieLogger: false , ZombieLoggerInterval: 0 , ZombieLoggerSampleSize: 0 } This error is new so I'm reverting the go mod update i've made to cbgt to point back to our last clean build: http://review.couchbase.org/c/cbgt/+/132672 http://review.couchbase.org/c/cbft/+/132673   The regression introduced here causes MB-40505 . Additional logging from within GOCBCORE .. 2020 / 07 / 16 12 : 07 : 35 (GOCBCORE) Failed to perform select bucket against server (document not found | { "status_code" : 1 , "bucket" : "beer-sample" , "error_name" : "KEY_ENOENT" , "error_description" : "Not Found" , "opaque" : 6 , "last_dispatched_to" : "127.0.0.1:12000" , "last_dispatched_from" : "127.0.0.1:58729" , "last_connection_id" : "919c13c337816aba/96712d3a0387dc1e" }) 2020 / 07 / 16 12 : 07 : 35 (GOCBCORE) Pipeline Client ` 127.0 . 0.1 : 12000 / 0xc0001b0230 ` preparing for new client loop 2020 / 07 / 16 12 : 07 : 35 (GOCBCORE) Pipeline Client ` 127.0 . 0.1 : 12000 / 0xc0001b0230 ` retrieving new client connection for parent 0xc0001b0190 2020 / 07 / 16 12 : 07 : 35 (GOCBCORE) Won't retry request. OperationID=waituntilready. Reason=CONNECTION_ERROR --> WaitUntilReady Callback, err: document not found | { "status_code" : 1 , "bucket" : "beer-sample" , "error_name" : "KEY_ENOENT" , "error_description" : "Not Found" , "opaque" : 6 , "last_dispatched_to" : "127.0.0.1:12000" , "last_dispatched_from" : "127.0.0.1:58729" , "last_connection_id" : "919c13c337816aba/96712d3a0387dc1e" } 2020 / 07 / 16 12 : 07 : 35 (GOCBCORE) Pipeline Client ` 127.0 . 0.1 : 12000 / 0xc0001b0230 ` received close request 2020 / 07 / 16 12 : 07 : 38 (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout 2020 / 07 / 16 12 : 07 : 38 (GOCBCORE) CCCPPOLL: Failed to retrieve config from any node.

            Build couchbase-server-7.0.0-2631 contains cbft commit 643c013 with commit message:
            GOCBC-868: Falling back to older gocbcore v8.0.3

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2631 contains cbft commit 643c013 with commit message: GOCBC-868 : Falling back to older gocbcore v8.0.3

            Build couchbase-server-7.0.0-2631 contains cbgt commit 05b2181 with commit message:
            GOCBC-868: Revert "Bumping up gocbcore version to >v9.0.3"

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2631 contains cbgt commit 05b2181 with commit message: GOCBC-868 : Revert "Bumping up gocbcore version to >v9.0.3"

            Abhinav Dangeti what is the expected behaviour here? The SDK cannot connect to the bucket so is reporting that the bucket doesn't exist (which is what the server is telling us) via the wait until ready.

            charles.dixon Charles Dixon added a comment - Abhinav Dangeti what is the expected behaviour here? The SDK cannot connect to the bucket so is reporting that the bucket doesn't exist (which is what the server is telling us) via the wait until ready.

            In the case I've highlighted above - the bucket was not actually deleted. It was warming up after a server restart.

            If you are suggesting that with the new set of changes we can't really differentiate between a missing bucket and a bucket being not-ready - this could be a big problem for us. D'you have any recommendations on how clients can differentiate between the 2 scenarios here?

            abhinav Abhinav Dangeti added a comment - In the case I've highlighted above - the bucket was not actually deleted. It was warming up after a server restart. If you are suggesting that with the new set of changes we can't really differentiate between a missing bucket and a bucket being not-ready - this could be a big problem for us. D'you have any recommendations on how clients can differentiate between the 2 scenarios here?

            Abhinav Dangeti I think that has always been the case. However I think (I'm not certain of this because I can't repro connecting to a bucket in warmup) but a missing bucket will be a `ErrAuthenticationFailure` and a bucket in warmup an `ErrDocumentNotFound`. I'm not entirely sure on that though.

            charles.dixon Charles Dixon added a comment - Abhinav Dangeti I think that has always been the case. However I think (I'm not certain of this because I can't repro connecting to a bucket in warmup) but a missing bucket will be a `ErrAuthenticationFailure` and a bucket in warmup an `ErrDocumentNotFound`. I'm not entirely sure on that though.

            I see. What about WaitUntilReady with the new changes - does it return immediately only in case of ErrAuthenticationFailure or for both the above errors? If ErrDocumentNotFound indicates that the bucket is in warmup - I'd expect WaitUntilReady to block until the bucket becomes ready. Does that make sense?

            abhinav Abhinav Dangeti added a comment - I see. What about WaitUntilReady with the new changes - does it return immediately only in case of ErrAuthenticationFailure or for both the above errors? If ErrDocumentNotFound indicates that the bucket is in warmup - I'd expect WaitUntilReady to block until the bucket becomes ready. Does that make sense?

            I see what you mean. Can you clarify which user fts auths as? That seems to change the error code returned by the server in this scenario.

            As a side note, you can effectively achieve this your side using a custom retry strategy (see https://github.com/couchbase/gocbcore/blob/master/retry.go and https://github.com/couchbase/gocbcore/blob/master/retry_test.go#L53 for an example) which you could pass to the WaitUntilReady operation. The retry strategy could return an action containing a 0 duration (do not retry) for all errors other than ErrDocumentNotFound which could use a non zero duration to trigger retrying (causing waituntilready to not return an error). Note that by default the SDK applies a global retry strategy of fast fail (no retries for all but a couple of cases) which can be overridden per operation.

            charles.dixon Charles Dixon added a comment - I see what you mean. Can you clarify which user fts auths as? That seems to change the error code returned by the server in this scenario. As a side note, you can effectively achieve this your side using a custom retry strategy (see https://github.com/couchbase/gocbcore/blob/master/retry.go and https://github.com/couchbase/gocbcore/blob/master/retry_test.go#L53 for an example) which you could pass to the WaitUntilReady operation. The retry strategy could return an action containing a 0 duration (do not retry) for all errors other than ErrDocumentNotFound which could use a non zero duration to trigger retrying (causing waituntilready to not return an error). Note that by default the SDK applies a global retry strategy of fast fail (no retries for all but a couple of cases) which can be overridden per operation.

            FTS auths as "admin". Could you let me know how the behavior is different based on the role?

            Let me give the custom retry strategy a shot - sounds reasonable enough to me.

            abhinav Abhinav Dangeti added a comment - FTS auths as "admin". Could you let me know how the behavior is different based on the role? Let me give the custom retry strategy a shot - sounds reasonable enough to me.

            Build couchbase-server-7.0.0-2677 contains gocbcore commit a192800 with commit message:
            GOCBC-868: Add fast fail waituntilready for non-default http bootstrap

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2677 contains gocbcore commit a192800 with commit message: GOCBC-868 : Add fast fail waituntilready for non-default http bootstrap

            Build couchbase-server-6.6.0-7892 contains gocbcore commit a192800 with commit message:
            GOCBC-868: Add fast fail waituntilready for non-default http bootstrap

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7892 contains gocbcore commit a192800 with commit message: GOCBC-868 : Add fast fail waituntilready for non-default http bootstrap
            abhinav Abhinav Dangeti added a comment - - edited

            Hey Charles Dixon, so i've been testing retry strategies, but with the information at hand I think it's not going to work for us. Here's why -

            • I cannot achieve fail-right-away with retry strategies - because RetryReason/RetryRequest do not have enough context for me to identify the error that's causing the retry; The WaitUntilReady callback is invoked once only after all the allowed retries have completed. The RetryReason in itself just says CONNECTION_ERROR and that's not enough for me to differentiate between ErrDocumentNotFound (bucket warming up) and ErrAuthenticationFailure (bucket not found).
            • Alternatively, I placed a loop around my WaitUntilReady(..) (with the default retry strategy) in case the error returned was ErrDocumentNotFound. While this works, I hate the look of it - a loop around a function called WaitUntilReady(..).

            For the first bullet above, do correct me if I'm wrong and let me know if there's a way I can derive the information I'll need to make a decision on the retry action. If not I'll need another way from you for FTS to achieve this properly.

            abhinav Abhinav Dangeti added a comment - - edited Hey Charles Dixon , so i've been testing retry strategies, but with the information at hand I think it's not going to work for us. Here's why - I cannot achieve fail-right-away with retry strategies - because RetryReason/RetryRequest do not have enough context for me to identify the error that's causing the retry; The WaitUntilReady callback is invoked once only after all the allowed retries have completed. The RetryReason in itself just says CONNECTION_ERROR and that's not enough for me to differentiate between ErrDocumentNotFound (bucket warming up) and ErrAuthenticationFailure (bucket not found). Alternatively, I placed a loop around my WaitUntilReady(..) (with the default retry strategy) in case the error returned was ErrDocumentNotFound. While this works, I hate the look of it - a loop around a function called WaitUntilReady(..). For the first bullet above, do correct me if I'm wrong and let me know if there's a way I can derive the information I'll need to make a decision on the retry action. If not I'll need another way from you for FTS to achieve this properly.

            For the first bullet above, do correct me if I'm wrong and let me know if there's a way I can derive the information I'll need to make a decision on the retry action.

            No you're right there is no way to make the distinction between errors within the retry reason. We could treat ErrDocumentNotFound as a NOT_READY reason and have it trigger a retry (so you wouldn't need your retry logic anymore) but we need to understand all of the implications of doing that first (e.g. if any other bootstrap requests can trigger it or if it will impact any other teams who rely on the behaviour for some reason).

            charles.dixon Charles Dixon added a comment - For the first bullet above, do correct me if I'm wrong and let me know if there's a way I can derive the information I'll need to make a decision on the retry action. No you're right there is no way to make the distinction between errors within the retry reason. We could treat ErrDocumentNotFound as a NOT_READY reason and have it trigger a retry (so you wouldn't need your retry logic anymore) but we need to understand all of the implications of doing that first (e.g. if any other bootstrap requests can trigger it or if it will impact any other teams who rely on the behaviour for some reason).

            Ok, thanks for confirming that Charles. So I'll look forward to a change here from you then, once you've figured the right thing to do.

            abhinav Abhinav Dangeti added a comment - Ok, thanks for confirming that Charles. So I'll look forward to a change here from you then, once you've figured the right thing to do.
            abhinav Abhinav Dangeti added a comment - http://review.couchbase.org/c/cbgt/+/134155

            Build couchbase-server-7.0.0-2818 contains cbgt commit c5969c6 with commit message:
            GOCBC-868: Introducing RetryStrategy for gocbcore.Agents

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2818 contains cbgt commit c5969c6 with commit message: GOCBC-868 : Introducing RetryStrategy for gocbcore.Agents

            Build couchbase-server-7.0.0-2819 contains gocbcore commit 9cd9897 with commit message:
            GOCBC-868: Add bucket not found retry reason

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2819 contains gocbcore commit 9cd9897 with commit message: GOCBC-868 : Add bucket not found retry reason

            Build couchbase-server-6.6.1-9153 contains gocbcore commit 9cd9897 with commit message:
            GOCBC-868: Add bucket not found retry reason

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9153 contains gocbcore commit 9cd9897 with commit message: GOCBC-868 : Add bucket not found retry reason

            Build sync_gateway-3.0.0-52 contains gocbcore commit 9cd9897 with commit message:
            GOCBC-868: Add bucket not found retry reason

            build-team Couchbase Build Team added a comment - Build sync_gateway-3.0.0-52 contains gocbcore commit 9cd9897 with commit message: GOCBC-868 : Add bucket not found retry reason

            Build sync_gateway-3.0.0-52 contains gocbcore commit a192800 with commit message:
            GOCBC-868: Add fast fail waituntilready for non-default http bootstrap

            build-team Couchbase Build Team added a comment - Build sync_gateway-3.0.0-52 contains gocbcore commit a192800 with commit message: GOCBC-868 : Add fast fail waituntilready for non-default http bootstrap

            Build sync_gateway-3.0.0-52 contains gocbcore commit 2d1ed35 with commit message:
            GOCBC-868: Expose a way to fast fail WaitUntilReady

            build-team Couchbase Build Team added a comment - Build sync_gateway-3.0.0-52 contains gocbcore commit 2d1ed35 with commit message: GOCBC-868 : Expose a way to fast fail WaitUntilReady

            People

              charles.dixon Charles Dixon
              abhinav Abhinav Dangeti
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty