Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.1.0
-
Untriaged
-
-
1
-
Unknown
Description
Build : 7.1.0-2416
Test : -test tests/integration/neo/test_neo_couchstore_milestone4.yml -scope tests/integration/neo/scope_couchstore.yml
Scale : 3
Iteration : 1st
There are 144 queries right now that are stuck in "timeout" or "submitted" state and is blocking a rebalance operation to add a new query node (172.23.104.137) to the cluster. This rebalance operation has been in progress for 5.5+ hrs, out of which more than 5 hrs it has been in the query service rebalance phase due to this issue.
select state,count(*) from system:active_requests where state!="running" group by state
|
[
|
{
|
"$1": 128,
|
"state": "timeout"
|
},
|
{
|
"$1": 16,
|
"state": "submitted"
|
}
|
]
|
1. Do we really need graceful shutdown when adding a new query node to the cluster ?
2. What is causing these queries to time out and be in the submitted state ?
Query nodes : 172.23.104.137, 172.23.104.155, 172.23.104.157
Attached :
1. cbcollect
2. active_requests dumps from all 3 nodes
Not sure if this is a regression or related to a recent change in the longevity to run N1QL statements in JS UDF. The previous run of the same test with 7.1.0-2400 did not show this issue.
UPDATE: rebalance completed successfully after I manually cancelled all the above queries.
Attachments
Issue Links
- backports to
-
MB-51826 [BP of MB-51289 to 7.0.4] - [System Test] Queries stuck in timeout/submitted stage since 9+ hrs blocking rebalance
-
- Closed
-