Loading...

Details

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: Morpheus, Columnar 1.0.3
Affects Version/s: 7.2.1, Columnar 1.0.0
Component/s: analytics
Labels:
Environment:
7.2.1-5819

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:
requestcancelledexception → http://supportal.couchbase.com/snapshot/2c70f82fb828e75f62f55f69d9f26004::0
Story Points:
0
Is this a Regression?:
No

Description

Create a 3 nodes colocated services (k:q:i:a) cluster, a bucket, 2 collections. Load some data in collections. Build indexes, create datasets etc.
Start n1ql/cbas query workload
Scale up the cluster from 3 to 6 nodes, 6 to 9 nodes. Everything went fine.
Scale down the cluster from 9 to 6 nodes and nodes started removing from the cluster 1 at a time.
While node removal is nearing completion some cbas/n1ql queries started hitting RequestCancelledExceptions based on which service is rebalancing and traffic is switching.
It looks like a race between cluster map shared to the client and client has already dispatched few requests to the outgoing node.

First instance of Exception for CBAS:

At 2023-06-29 22:36:57,984 PST OR 05:36:57 AM 30 Jun, 2023 GMT

com.couchbase.client.core.error.RequestCanceledException: AnalyticsRequest, Reason: NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT) {"cancelled":true,"completed":true,"coreId":"0x987046a500000002","idempotent":false,"lastDispatchedFrom":"172.23.107.120:40634","lastDispatchedTo":"svc-dqisa-node-007.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com:18095","reason":"NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT)","requestId":538504,"requestType":"AnalyticsRequest","retried":0,"service":{"httpMethod":"POST","operationId":"query_thread_default04","priority":0,"statement":"select v.name, animal from default0_VolumeCollection0_ds1 as v unnest v.animals as animal where v.attributes.hair = \"Burgundy\" limit 10;","type":"analytics","uri":"/analytics/service"},"timeoutMs":75000,"timings":{"totalMicros":5037850}}

Rebalance out start time 5:34:31 AM 30 Jun, 2023
Starting rebalance, KeepNodes = ['ns_1@svc-dqisa-node-001.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-002.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-003.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-004.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-005.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-006.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-008.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-009.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com'], EjectNodes = ['ns_1@svc-dqisa-node-007.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com'],

Rebalance out complete time 5:37:18 AM 30 Jun, 2023
Rebalance completed successfully.
Rebalance Operation Id = 018a2576213a6310d43a42ff45d5a3d2
ns_orchestrator 000
ns_1@svc-dqisa-node-001.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com
5:37:18 AM 30 Jun, 2023

N1ql Exception at 2023-06-29 22:42:41,141 OR 5:42:41 AM 30 Jun, 2023 GMT

2023-06-29 22:42:41,141 | infra | CRITICAL | query_thread_default09 | [hostedN1QL:_run_query:447] com.couchbase.client.core.error.RequestCanceledException: QueryRequest, Reason: NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT) {"cancelled":true,"completed":true,"coreId":"0x987046a500000001","idempotent":false,"lastDispatchedFrom":"172.23.107.120:51262","lastDispatchedTo":"svc-dqisa-node-009.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com:18093","reason":"NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT)","requestId":655084,"requestType":"QueryRequest","retried":0,"service":{"bucket":"default0","operationId":"query_thread_default09","scope":"_default","statement":"select name from VolumeCollection0 where age between 30 and 50 limit 100;","type":"query"},"timeoutMs":75000,"timings":{"totalMicros":20851}}

This is seen during:

Rebalance Started at 5:40:13 AM 30 Jun, 2023
Starting rebalance, KeepNodes = ['ns_1@svc-dqisa-node-001.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-002.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-003.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-004.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-005.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-006.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-008.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com'], EjectNodes = ['ns_1@svc-dqisa-node-009.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 0337aebb09f83d6978dd19373021a094

Rebalance completed at 5:42:59 AM 30 Jun, 2023
Rebalance completed successfully.
Rebalance Operation Id = 0337aebb09f83d6978dd19373021a094
ns_orchestrator 000
ns_1@svc-dqisa-node-001.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com
5:42:59 AM 30 Jun, 2023

Rebalance order by service on a node: KV -> FTS -> Index -> Analytics -> N1ql

cc: Ritam Sharma

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

image-2023-06-30-00-03-05-165.png
396 kB
30/Jun/23 12:03 AM
JavaSDK.log.zip
4.73 MB
07/Jul/23 4:19 PM
testLogs_7.2.0.txt
668 kB
06/Jul/23 11:01 PM
testLogs.txt
404 kB
07/Jul/23 4:16 PM
TestLogs.txt
530 kB
30/Jun/23 12:15 AM

Issue Links

depends on

MB-62997 Delay finalizing rebalance out of a node completion until its http server is idle

Open

is blocked by

MB-62232 Topology-aware services need ability to control when service is added / removed from service map

Open

relates to

JVMCBC-1449 Defer closing active Analytics query connections

Closed

NCBC-3451 Defer closing active query connections

Resolved

JVMCBC-1334 Defer closing active query connections

Closed

JVMCBC-1333 Retry SQL++ queries that fail with error code 1181 (Service shut down)

Closed

(1 relates to)

RequestCancelledExceptions are seen when cbas node rebalance out nearing completion.

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

PagerDuty