Details
-
Bug
-
Resolution: Fixed
-
Critical
-
5.5.0
-
Untriaged
-
Unknown
-
CX Sprint 94
Description
With 4 CB buckets and 200 ops/sec in each, analytics cluster goes into bad state and becomes too slow to access. Data ingestion also stopped.
We ran system-test for 24 hrs. There are 4 KV buckets, and corresponding to each bucket, there is 1 CBAS bucket and a dataset which is non-filtered.
This is a simple test where we keep on ingesting data and running cbas queries on all 4 datasets at 12 queries/sec for a set duration. This test doesn't involve any rebalance, bucket disconnection or failover.
I started the run yesterday morning and the cluster was in the bad state yesterday evening. Was not able to see the bucket insights in analytics workbench.
Although, the cluster is recovered today morning and data ingestion also stared sometime in-between but for one bucket data is mismatching with CB bucket.
Docs in CB bucket: 823,410
Docs in cbas dataset:
[
]
Attaching the cb-collect from yesterdays and today.
The cluster is live if you want to debug :
172.23.108.231 (kv)
172.23.108.232 (cbas)
172.23.108.233 (cbas)
172.23.108.234 (kv)
There are a lot of exceptions in the logs:
analytics.log.2.gz:org.apache.asterix.common.exceptions.RuntimeDataException: ASX0023: 60.0s passed before getting back the responses from NCs
analytics.log.1.gz:java.net.ConnectException: Connection refused
analytics.log.1.gz:org.apache.hyracks.ipc.exceptions.IPCException: java.io.IOException: Connection failed to /172.23.108.233:9112
analytics.log.1.gz:Caused by: java.io.IOException: Connection failed to /172.23.108.233:9112
analytics.log.1.gz:org.apache.hyracks.api.exceptions.NetException: Socket Closed
analytics.log.1.gz:org.apache.asterix.common.exceptions.ReplicationException: java.io.EOFException
analytics.log.1.gz:Caused by: java.io.EOFException
Yesterday's logs:
s3://cb-customers-secure/system-test-analytics/collectinfo-2018-03-08t180243-ns_1@172.23.108.232.zip
s3://cb-customers-secure/system-test-analytics/collectinfo-2018-03-08t180243-ns_1@172.23.108.233.zip
s3://cb-customers-secure/system-test-analytics/collectinfo-2018-03-08t180243-ns_1@172.23.108.234.zip
s3://cb-customers-secure/system-test-analytics/collectinfo-2018-03-08t180243-ns_1@172.23.108.231.zip
Today's Logs:
s3://cb-customers-secure/system-test-analytics_1/collectinfo-2018-03-09t032529-ns_1@172.23.108.232.zip
s3://cb-customers-secure/system-test-analytics_1/collectinfo-2018-03-09t032529-ns_1@172.23.108.233.zip
s3://cb-customers-secure/system-test-analytics_1/collectinfo-2018-03-09t032529-ns_1@172.23.108.234.zip
s3://cb-customers-secure/system-test-analytics_1/collectinfo-2018-03-09t032529-ns_1@172.23.108.231.zip
Attachments
Issue Links
- links to