In the latest development server versions 0xD is occasionally returned by the server
Description
Environment
Gerrit Reviews
Release Notes Description
Attachments
relates
Activity

Jeffry Morris May 8, 2023 at 11:38 PM
*Please let the customer know its untested by our QE.

Jeffry Morris April 27, 2023 at 6:26 PM
Theory:
I think I see what's going on and it's somewhat confusing. This is the KVbucketMap servers:
And then there are servers with the Data Service enabled (at least the config advertises them) but not in this servers list:
It looks like SELECT_BUCKET is being called on these nodes and then GET_CID is called which generates the EConfigOnly.

Jeffry Morris April 27, 2023 at 2:26 AMEdited
It looks like I am running into this and the commit.
I remember this issue, this should allow SDKs to fetch config from all nodes, but not do other operations. So you should be able to SELECT_BUCKET and GET_CLUSTER_CONFIG, but not issue any other commands. Which should be fine
if you're getting that 0x0D, that means you're sending some other command, which indicates-- I think-- a bootstrap bug of some sort. There shouldn't be any other commands being sent to that node.
The question I have is how do you know its a config-only bucket? From what I have seen, I am getting 0xD on KV nodes. I try a simple fix of retrying the op until it succeeds or times out and this works, but seems somewhat hacky.

Jeffry Morris April 27, 2023 at 12:15 AM
-
I got a little further debugging. The 0xD error happens seemingly randomly and while currently unmapped in the SDK to a ResponseStatus, it does have an ErrorMap entry:
I am not 100% what this means, but seems to indicate that the node is not a KV node; however, the server config verifies that the node is indeed a KV node and the SDK does check this before sending the operation.
Note that I confirmed that the GET_CID command (to fetch the collection Id) is being done on a KV node after SELECT_BUCKET is called.
I am going to try mapping the 0xD status so that it goes into the retry loop instead of failing fast as it's doing today. Perhaps its just a matter of trying another node, but still seems odd imo.

Jeffry Morris April 26, 2023 at 7:02 PMEdited
So, using connection string and credentials and ignoring RemoteCertificateNameMismatch and the error can be reproduced on the Elixir database instance:
What's different between the server versions? What is 0xD? It doesn't map to any existing response status in the SDK.
Details
Details
Assignee

Reporter

Story Points
Sprint
Fix versions
Priority
Instabug
PagerDuty
PagerDuty Incident
PagerDuty

Sentry
Linked Issues
Sentry
Zendesk Support
Linked Tickets
Zendesk Support

A call for a GET on a key, inconsistently ends up with the first call failing with 0xD, indicating that select_bucket was not called on the server node before the GET_CID operation is called.