Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
7.6.0, 7.2.4
-
None
-
Amazon Linux 2
-
Untriaged
-
Linux x86_64
-
0
-
Unknown
Description
Background:
- We are testing how a Couchbase upgrade on a live cluster would affect the live traffic.
- We began with all nodes in the cluster running version v7.2.4 and started query, data-read, and data-write load test scripts. While the load is ongoing, we planned to upgrade the version to v7.6.0, one node at a time.
- We are following this documentation (Using the spare node technique): https://docs.couchbase.com/server/current/install/upgrade-cluster-online-full-capacity.html
Issue:
- We chose to upgrade one data node and one index node, so we introduced two spare nodes (thats are on version v7.6.0) ** to replace these existing nodes. **
- After the re-balance, which took approximately 24 hours, everything appears fine. However, queries executed via the Couchbase GUI are failing, specifically when we use the data node's hostname. We have 8 data nodes, and none are working. The query HTTP request receives a 503 response (See tcpdump request/response).
-
POST /_p/query/query/service HTTP/1.1
Host: theta-cba-data-b.sightplan-ops.net:8091
Connection: keep-alive
Content-Length: 536
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36
CB-User-Agent: Couchbase Query Workbench
Content-Type: application/json
ignore-401: true
Accept: application/json, text/plain, */*
Cache-Control: no-cache
ns-server-ui: yes
invalid-auth-response: on
ns-server-proxy-timeout: 601000
Origin: http://theta-cba-data-b.sightplan-ops.net:8091
Referer: http://theta-cba-data-b.sightplan-ops.net:8091/ui/index.html
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Cookie: ui-auth-theta-cba-data-b.sightplan-ops.net%3A8091=<REDACT>
{"statement":"SELECT RAW META(sg).id\nFROM `sync_gateway_sw1` AS sg\nWHERE (NOT IFMISSINGORNULL(sg.`softDelete`, FALSE))\n ....... <REDACTED THE QUERY>)\nLIMIT 10","pretty":true,"timeout":"600s","client_context_id":"dcb046bb-d22e-44d0-b1bb-c5f289b929e7","profile":"timings","scan_consistency":"not_bounded","use_cbo":true,"txtimeout":"120s","controls":true,"tximplicit":false}
HTTP/1.1 503 Service Unavailable
Cache-Control: no-cache,no-store,must-revalidate
Connection: close
Content-Length: 75
Content-Type: text/plain
Date: Tue, 26 Mar 2024 19:10:23 GMT
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Pragma: no-cache
Server: Couchbase Server
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Permitted-Cross-Domain-Policies: none
X-XSS-Protection: 1; mode=block
Service n1ql not running on this node, and compatible service is not found.
- I assume the flow when executing a querying using GUI is: GUI--
(8091)--> DataNode(beam.smp)-(8093)-> QueryService-(9101)---> IndexService. - Here we did not see Data node proxying the request to Query node, so we believe Data node is the one returning 503 for some reason.
- NOTE: If we access CB GUI using a (Ex: http://theta-cba-query-b.sightplan-ops.net:8091/ui/index.html) the same query works. See the screenshots
Before the rebalance and upgrade, we were able to execute queries when connected to the Couchbase GUI using the data node's hostname. However, after the upgrade, we started experiencing this issue. Therefore, we would like to resolve it.