Subdoc failures after restarting CB server

Description

With SDK 3.0.2 RC:  (http://review.couchbase.org/c/couchbase-net-client/+/130287)

Testing Against: CB Server 6.5.1 - 4 node cluster with services 1x{kv}, 3 x {kv,index,fts,n1ql}

There is an issue where subdoc ops fail after all (or some) of the CB nodes are stopped and restarted. As seen below.

http://sdkqe-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-6.5.1-6299/SvcRestartAll-SUBDOC/06-11-20/043535/4ade3a166e84d5ceed8047c7575dd5c6-SD.html

Logs: 

Also occurs with CB 6.0.4.

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Attachments

10

Activity

Show:

Will Broadbelt June 17, 2020 at 10:36 AM

-
Patchset 3 has fixed it! Running the whole suite now to check for regressions, and I'll resolve the ticket once that is done.

Will Broadbelt June 16, 2020 at 8:25 PM

Will Broadbelt June 16, 2020 at 12:33 PM
Edited

:

Running with http://review.couchbase.org/c/couchbase-net-client/+/130602/1 , I see there is a CircuitBreakerException happening now.

Graph: http://sdkqe-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-6.0.4-3082/SvcRestartAll-SUBDOC/06-16-20/043638/5ba23646c7bbb59f8908099e49a646d8-SD.html

Logs:

Also, here's some of the Sdkd stack traces that are being thrown:

Will Broadbelt June 15, 2020 at 3:36 PM
Edited

I think I have 'fixed' the issue running with 6.5 by just increasing the WaitForReady to 20s - though I'm still running tests for it now. UPDATE: Still getting timeouts with 20s.

But for CB <6.5 I can't use this. Looking at the logs I think the issue is that the GetClusterConfig is returning BucketNotConnected (occasionally), so connecting to the cluster is failing. Filed - .

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Story Points

Fix versions

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created June 11, 2020 at 4:23 PM
Updated June 18, 2020 at 11:02 AM
Resolved June 17, 2020 at 10:07 PM
Instabug