Details
-
Bug
-
Resolution: Fixed
-
Critical
-
CBAS DP2
-
Untriaged
-
No
-
CX Sprint 44, CX Sprint 45
Description
CBAS Build : 514
Issue : When there are ~2000 queries in the queue to be processed, further incoming requests are not accepted. There is no error in the CBAS logs which can help determine the exact issue, but on the testrunner side, we can a lot of errors like the following one -
2017-02-24 12:24:01 | ERROR | MainProcess | query thread 1297 | [rest_client._http_request] socket error while connecting to http://10.111.151.102:8095/analytics/service error [Errno 36] Operation now in progress
Also, after the test fails, the CBAS cluster is unusable. It doesn't respond to any new request. A server restart has to be done to bring it back to normal state.
The logs are not being helpful with this test in determining what exactly went wrong, was the request accepted or not. Also, it would be great if it can periodically print some stats that would show the health of the cluster overall, incl. some stats for queue.
Test :
1. Setup the dataset (10000 items in this testcase) and create a bucket, dataset on CBAS and connect to that bucket. Allow ingestion to complete.
2. Fire queries on this dataset concurrently. This test aims to trigger 6000 queries in batches of 50 with a sleep of 5s in between the batches. This would take 6000/50*5 = 10 mins.
3. The query used is "select sleep(count,500) from
where mutated=0;". So in 5 seconds, about 10 queries can be serviced.
Incoming queries in 5s = 50, outgoing = 10. So, there every 5s the queue goes up by 40 requests.
The test abruptly ends when 2450 queries have been sent.
This test succeeded when the total queries sent concurrently in batches of 50 were 2000, so it looks like there is an issue when the queue size exceeds ~2000.
Attaching the CBAS logs and the testrunner console output.