In the latest development server versions 0xD is occasionally returned by the server

Description

A call for a GET on a key, inconsistently ends up with the first call failing with 0xD, indicating that select_bucket was not called on the server node before the GET_CID operation is called.

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Attachments

4

Activity

Show:

Jeffry Morris May 8, 2023 at 11:38 PM

 

*Please let the customer know its untested by our QE.

Jeffry Morris April 27, 2023 at 6:26 PM

Theory:

I think I see what's going on and it's somewhat confusing. This is the KVbucketMap servers:

And then there are servers with the Data Service enabled (at least the config advertises them) but not in this servers list:

It looks like SELECT_BUCKET is being called on these nodes and then GET_CID is called which generates the EConfigOnly.

Jeffry Morris April 27, 2023 at 2:26 AM
Edited

It looks like I am running into this and the commit.

I remember this issue, this should allow SDKs to fetch config from all nodes, but not do other operations.  So you should be able to SELECT_BUCKET and GET_CLUSTER_CONFIG, but not issue any other commands.  Which should be fine
 
if you're getting that 0x0D, that means you're sending some other command, which indicates-- I think-- a bootstrap bug of some sort.  There shouldn't be any other commands being sent to that node.

 

The question I have is how do you know its a config-only bucket? From what I have seen, I am getting 0xD on KV nodes. I try a simple fix of retrying the op until it succeeds or times out and this works, but seems somewhat hacky.

Jeffry Morris April 27, 2023 at 12:15 AM

  -

I got a little further debugging. The 0xD error happens seemingly randomly and while currently unmapped in the SDK to a ResponseStatus, it does have an ErrorMap entry:

I am not 100% what this means, but seems to indicate that the node is not a KV node; however, the server config verifies that the node is indeed a KV node and the SDK does check this before sending the operation.

Note that I confirmed that the GET_CID command (to fetch the collection Id) is being done on a KV node after SELECT_BUCKET is called.

I am going to try mapping the 0xD status so that it goes into the retry loop instead of failing fast as it's doing today. Perhaps its just a matter of trying another node, but still seems odd imo.

 

Jeffry Morris April 26, 2023 at 7:02 PM
Edited

So, using connection string and credentials and ignoring RemoteCertificateNameMismatch and the error can be reproduced on the Elixir database instance:

What's different between the server versions? What is 0xD? It doesn't map to any existing response status in the SDK.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Story Points

Sprint

Fix versions

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created February 15, 2023 at 7:38 PM
Updated May 9, 2023 at 6:59 PM
Resolved May 9, 2023 at 6:59 PM
Instabug