Description
When bulk operations fail (either due to timeout or queue overflow), all subsequent operations using that client are also failing.
Go test to reproduce the issue can be found here - can run this from gocb root folder:
https://gist.github.com/adamcfraser/b174f9f543d2ca541dff
To run the test:
- copy bucket_test.go into couchbase/gocb
- edit server and bucketName to valid values (I was just testing against a local CBS running on my macbook)
- go test -run=TestTimeoutHandling
The test:
1. Writes 1M docs to bucket
2. Starts a single goroutine that loops, doing a simple get operation
3. Starts multiple goroutines (maxGoroutines) to execute bulk get calls (each call gets bulkGetSize). If one of these goroutines gets an error in response to the bulk get call, that goroutine terminates.
4. An additional goroutine dumps stats on active goroutines, simple get success/fail, etc.
Observed results:
i. If maxGoroutines is low (<25), this runs without error at bulkGetSize=150, and runs the reads at about 70K ops/second on a local couchbase server
ii. If maxGoroutines is a bit higher (50), this fails in the following way:
- some of the bulk get goroutines get a timeout error, and terminate (this part is expected behaviour - would be the trigger for the client to reduce load)
- the remaining bulk get ops hang, and never return
- ops on the couchbase bucket drop to zero
- the single-get goroutine only returns timeout errors