Subdoc failures after restarting CB server

Description

With SDK 3.0.2 RC: (http://review.couchbase.org/c/couchbase-net-client/+/130287)

Testing Against: CB Server 6.5.1 - 4 node cluster with services 1x{kv}, 3 x {kv,index,fts,n1ql}

There is an issue where subdoc ops fail after all (or some) of the CB nodes are stopped and restarted. As seen below.

http://sdkqe-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-6.5.1-6299/SvcRestartAll-SUBDOC/06-11-20/043535/4ade3a166e84d5ceed8047c7575dd5c6-SD.html

Logs:

Also occurs with CB 6.0.4.

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Attachments

Activity

Show:

Will Broadbelt June 17, 2020 at 10:36 AM

-
Patchset 3 has fixed it! Running the whole suite now to check for regressions, and I'll resolve the ticket once that is done.

Will Broadbelt June 16, 2020 at 8:25 PM

With one thread: http://sdkqe-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-6.5.1-6299/SvcRestartAll-SUBDOC/06-16-20/072953/fd9e0950709e31920a254a6b1fd340f0-SD.html
Logs:

Also included the Jenkins console log as it has stack traces:

Will Broadbelt June 16, 2020 at 7:52 PM

Running with amended patch:

http://sdkqe-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-6.5.1-6299/SvcRestartAll-SUBDOC/06-16-20/071012/f601b9632c46ef2de95dbc310077152d-SD.html

Will Broadbelt June 16, 2020 at 12:33 PM
Edited

Running with http://review.couchbase.org/c/couchbase-net-client/+/130602/1 , I see there is a CircuitBreakerException happening now.

Graph: http://sdkqe-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-6.0.4-3082/SvcRestartAll-SUBDOC/06-16-20/043638/5ba23646c7bbb59f8908099e49a646d8-SD.html

Logs:

Also, here's some of the Sdkd stack traces that are being thrown:

Will Broadbelt June 15, 2020 at 3:36 PM
Edited

I think I have 'fixed' the issue running with 6.5 by just increasing the WaitForReady to 20s - though I'm still running tests for it now. UPDATE: Still getting timeouts with 20s.

But for CB <6.5 I can't use this. Looking at the logs I think the issue is that the GetClusterConfig is returning BucketNotConnected (occasionally), so connecting to the cluster is failing. Filed - .

Fixed

Pinned fields

Click on the next to a field label to start pinning.

Details

Assignee

Jeffry Morris

Reporter

Will Broadbelt

Labels

Story Points

Fix versions

3.0.2

Priority

Major

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created June 11, 2020 at 4:23 PM

Updated June 18, 2020 at 11:02 AM

Resolved June 17, 2020 at 10:07 PM

Configure

Instabug

Subdoc failures after restarting CB server

Description

Environment

Gerrit Reviews

Release Notes Description

Attachments

Activity

Will Broadbelt June 17, 2020 at 10:36 AM

Will Broadbelt June 16, 2020 at 8:25 PM

Will Broadbelt June 16, 2020 at 7:52 PM

Will Broadbelt June 16, 2020 at 12:33 PMEdited

Will Broadbelt June 15, 2020 at 3:36 PMEdited

Details

Assignee

Reporter

Labels

Story Points

Fix versions

Priority

Instabug

PagerDuty

Sentry

Zendesk Support

Will Broadbelt June 16, 2020 at 12:33 PM
Edited

Will Broadbelt June 15, 2020 at 3:36 PM
Edited