Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 1.0.1
Affects Version/s: None
Component/s: library
Labels:
None

Description

When bulk operations fail (either due to timeout or queue overflow), all subsequent operations using that client are also failing.

Go test to reproduce the issue can be found here - can run this from gocb root folder:
https://gist.github.com/adamcfraser/b174f9f543d2ca541dff

To run the test:

copy bucket_test.go into couchbase/gocb
edit server and bucketName to valid values (I was just testing against a local CBS running on my macbook)
go test -run=TestTimeoutHandling

The test:
1. Writes 1M docs to bucket
2. Starts a single goroutine that loops, doing a simple get operation
3. Starts multiple goroutines (maxGoroutines) to execute bulk get calls (each call gets bulkGetSize). If one of these goroutines gets an error in response to the bulk get call, that goroutine terminates.
4. An additional goroutine dumps stats on active goroutines, simple get success/fail, etc.

Observed results:
i. If maxGoroutines is low (<25), this runs without error at bulkGetSize=150, and runs the reads at about 70K ops/second on a local couchbase server
ii. If maxGoroutines is a bit higher (50), this fails in the following way:

some of the bulk get goroutines get a timeout error, and terminate (this part is expected behaviour - would be the trigger for the client to reduce load)
the remaining bulk get ops hang, and never return
ops on the couchbase bucket drop to zero
the single-get goroutine only returns timeout errors

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Brett Lawson

Reporter:: Adam Fraser

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Nov/15 1:12 AM

Updated:: 17/Nov/15 4:07 PM

Resolved:: 17/Nov/15 7:27 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

GOCBC-72: Use goroutine so execute does not block signal handling.: Gerrit Review:

GOCBC-72: Use a larger channel for bulk op signalling.: Gerrit Review:

Bulk op failures cause all subsequent client ops to fail/timeout

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty