Description
Motivation
Once a connection has been established and it goes down, the code
always did retry properly until it came back online or was removed
from the cluster configuration.
During bootstrap however, if the socket could not be opened in the
first place it was deemed down and kept that way. This meant that
when one node in the cluster is down for some reason and the client
could bootstrap from another, the node is not picked up properly
when it comes back online.
Modifications
This changeset makes the bootstrap process more resilient to this
kind of failure but still responds to the boot observable with a
failed attempt. In the background, the reconnect is issued so
that there is a chance the node can be picked up as soon as it
comes online.
Result
The code is now more resilient to partial node failures during
client bootstrap and the behaviour aligns more with what one
might expect from the system.
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
Workflow | classic default workflow [ 62067 ] | Couchbase SDK Workflow [ 67680 ] |
Description |
Motivation
---------- Once a connection has been established and it goes down, the code always did retry properly until it came back online or was removed from the cluster configuration. During bootstrap however, if the socket could not be opened in the first place it was deemed down and kept that way. This meant that when one node in the cluster is down for some reason and the client could bootstrap from another, the node is not picked up properly when it comes back online. Modifications ------------- This changeset makes the bootstrap process more resilient to this kind of failure but still responds to the boot observable with a failed attempt. In the background, the reconnect is issued so that there is a chance the node can be picked up as soon as it comes online. Result ------ The code is now more resilient to partial node failures during client bootstrap and the behaviour aligns more with what one might expect from the system. |
Description |
Motivation
---------- Once a connection has been established and it goes down, the code always did retry properly until it came back online or was removed from the cluster configuration. During bootstrap however, if the socket could not be opened in the first place it was deemed down and kept that way. This meant that when one node in the cluster is down for some reason and the client could bootstrap from another, the node is not picked up properly when it comes back online. Modifications ------------- This changeset makes the bootstrap process more resilient to this kind of failure but still responds to the boot observable with a failed attempt. In the background, the reconnect is issued so that there is a chance the node can be picked up as soon as it comes online. Result ------ The code is now more resilient to partial node failures during client bootstrap and the behaviour aligns more with what one might expect from the system. |
+Motivation+
Once a connection has been established and it goes down, the code always did retry properly until it came back online or was removed from the cluster configuration. During bootstrap however, if the socket could not be opened in the first place it was deemed down and kept that way. This meant that when one node in the cluster is down for some reason and the client could bootstrap from another, the node is not picked up properly when it comes back online. +Modifications+ This changeset makes the bootstrap process more resilient to this kind of failure but still responds to the boot observable with a failed attempt. In the background, the reconnect is issued so that there is a chance the node can be picked up as soon as it comes online. +Result+ The code is now more resilient to partial node failures during client bootstrap and the behaviour aligns more with what one might expect from the system. |
Link | This issue is triggering CBSE-3297 [ CBSE-3297 ] |