Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
It is in the ClusterContext.ProcessClusterMap method. In the patch and I believe in all previous versions it propagated all exceptions that might occur during connecting to the bucket class, but in the v3.4.5 there is a try catch block that handles all exceptions and just log them. This make the CouchbaseBucket.ConfigUpdatedAsync method to think that connection was established and update its CurrentConfig prop (which prevents from updating it in the future), but in the |Nodes collection there is still no new node.
Good to hear, i'll look into the TaskCancellationException as yes they should be XxxTimeoutExceptions.
Jeff
Eugene Shcherbo June 2, 2023 at 4:04 PM
Hi @Jeffry Morris
I tested it and looks like the issue with rebalance is fixed in the package. Thank you.
Just FYI: before the cluster map updated I still saw the TaskCancelledException instead of timeouts. This is not an issue for me seems I know that it usually means timeout, but just to let you know.
Jeffry Morris June 1, 2023 at 11:39 PM
@Eugene Shcherbo -
VF:
Jeffry Morris June 1, 2023 at 5:39 PM
Hello @Eugene Shcherbo -
Indeed in this specific case, a config can appear to be processed, but had actually failed to be processed correctly leaving the SDK in a bad state until it can process a newer config revision successfully. It's definitely a client bug and a patch is in works for 3.4.7 which is planned for release 6/6/2023. Triggering the bug is somewhat of an edge-case we would expect the ports/hosts to be discoverable by the SDK while the rebalancing is occurring.
We will post a package for testing sometime today as a VF. It hasn't been through QE, so I wouldn't use it in production until the official v3.4.7 release is on NuGet.
Initial discussion on the issue: https://forums.couchbase.com/t/couchbase-v3-sdk-kvnotmyvbucket-errors-after-add-node-rebalance/35438
Initial ticket for the issue: https://couchbasecloud.atlassian.net/browse/NCBC-3350
Preconditions:
Have a cluster of 2 nodes where one get traffic (call it #1) and node waiting to be added to the cluster via rebalancing (call it #2)
Have a .NET client sending requests to the cluster
Steps
Block connections from the .NET client to the node #2
Start rebalance operation
In the middle of rebalance unblock connections to the node #2
Wait for the rebalance to finish
Expected Result
The app is recovered after rebalance is finished and continue working properly
Actual Result
The app is not recovered and constantly throwing exceptions:
Timeouts exceptions before v3.4.5
TaskCancelledExceptions after v3.4.5 (this is not fixed in v3.4.6 which can be verified with this test)
Dev Notes
Lots of investigation details can be found on the forum topic and in the previous ticket (the links are above).
The latest details:
There is the difference between this patch set (that solves the issue) https://review.couchbase.org/c/couchbase-net-client/+/186991 and the released v3.4.5 (and v3.4.6) version.
It is in the
ClusterContext.ProcessClusterMap
method. In the patch and I believe in all previous versions it propagated all exceptions that might occur during connecting to the bucket class, but in the v3.4.5 there is a try catch block that handles all exceptions and just log them. This make theCouchbaseBucket.ConfigUpdatedAsync
method to think that connection was established and update its CurrentConfig prop (which prevents from updating it in the future), but in the |Nodes collection there is still no new node.Current code that hides exceptions: https://github.com/couchbase/couchbase-net-client/blob/master/src/Couchbase/Core/ClusterContext.cs#L813-L816