Rework WaitUntilReady

Description

Suggested release note:

The waitUntilReady method is now more aggressive about retrying failed pings. Also, waiting for a desired state of DEGRADED no longer fails when the client is fully connected to the cluster.

Investigate how WaitUntilReady could be improved.

Specifically:

Rework the pacemaker. The current "wait some more" logic is odd, and results in redundant node health checks. Instead of being driven by a flux interval(), perhaps we could use a retry operator.
Investigate whether pings are currently being retried, and look into why we're consulting the diagnostic results – should ping alone be sufficient?

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Linked issues

relates

JVMCBC-1169

waitUntilReady intermittently times out

Activity

Show:

David Nault June 15, 2023 at 10:11 PM
Edited

Currently, specifying desired state "DEGRADED" causes a timeout if the cluster state is actually fully "ONLINE". I suppose this is useful if you're actually waiting for the cluster to be degraded

Also, perhaps we should revisit what it means to be degraded. Currently a cluster qualifies as degraded if there is more than 1 endpoint, and at least one endpoint is ONLINE. The ONLINE endpoint could be for any service.

According to RFC, "Degraded" means "at least one socket per service is reachable". https://github.com/couchbaselabs/sdk-rfcs/blob/master/rfc/0061-sdk3-diagnostics.md#summary

David Nault June 15, 2023 at 9:43 PM

Currently, a failed ping does not cause WaitUntilReady to fail, as long as the endpoint connection was established. Hmmm...

Fixed

Pinned fields

Click on the next to a field label to start pinning.

Details

Assignee

David Nault

Reporter

David Nault

Story Points

Sprint

None

Fix versions

2.4.9

Priority

Major

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created June 14, 2023 at 5:43 PM

Updated July 27, 2023 at 3:49 PM

Resolved July 27, 2023 at 3:49 PM

Configure

Instabug

Rework WaitUntilReady

Description

Environment

Gerrit Reviews

Release Notes Description

Linked issues

relates

Activity

David Nault June 15, 2023 at 10:11 PMEdited

David Nault June 15, 2023 at 9:43 PM

Details

Assignee

Reporter

Story Points

Sprint

Fix versions

Priority

Instabug

PagerDuty

Sentry

Zendesk Support

David Nault June 15, 2023 at 10:11 PM
Edited