Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: CBAS DP4
Affects Version/s: CBAS DP4
Component/s: analytics
Labels:
None

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
https://s3.amazonaws.com/bugdb/jira/cbas_rebal_fail/collectinfo-2017-10-15T132233-ns_1%40172.23.109.48.zip
https://s3.amazonaws.com/bugdb/jira/cbas_rebal_fail/collectinfo-2017-10-15T124518-ns_1%40172.23.98.151.zip

Show
https://s3.amazonaws.com/bugdb/jira/cbas_rebal_fail/collectinfo-2017-10-15T132233-ns_1%40172.23.109.48.zip https://s3.amazonaws.com/bugdb/jira/cbas_rebal_fail/collectinfo-2017-10-15T124518-ns_1%40172.23.98.151.zip
Epic Link:
Cluster Management
Is this a Regression?:
Yes
Sprint:
CX Sprint 74

Description

Build : 5.0.0-783 (also seen in build 5.0.0-766)

We have a test that runs 4096 queries in batches of 200 queries in async mode. This is the query - select sleep(count,500) from default_ds.

All queries run fine. But rebalance out of the analytics node (the only analytics node in the cluster fails).

The UI diag logs says -
Rebalance exited with reason {service_rebalance_failed,cbas, {rebalance_failed,

{service_error, <<"Rebalance fafe847c0319354c5834b8bc61723348 failed, see analytics log for details">>}

}}

The analytics.log.1.gz (attached) has lots of errors/warnings like these while the test was running:

2017-10-15T06:08:54.796-07:00 WARN CBAS.work.NotifyTaskFailureWork [Worker:d1e24228352dbf84f8b2bc277f72f35c] d1e24228352dbf84f8b2bc277f72f35c is sending a notification to cc that task TAID:TID:ANID:ODID:3:0:0:0 has failed

org.apache.hyracks.api.exceptions.HyracksDataException: Index resource couldn't be found. Has it been created yet? Was it deleted?

        at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:134) ~[hyracks-api-1.0.0-cbas-dp3.jar:1.0.0-cbas-dp3]

        at org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:63) ~[hyracks-control-common-1.0.0-cbas-dp3.jar:1.0.0-cbas-dp3]

        at org.apache.hyracks.control.nc.Task.run(Task.java:367) ~[hyracks-control-nc-1.0.0-cbas-dp3.jar:1.0.0-cbas-dp3]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]

        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_131]

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

analytics.log.1.gz
1.79 MB
15/Oct/17 6:34 AM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-26400
#	Subject	Branch	Project	Status	CR	V
84405,2	MB-26400: guard access of NodeControllerState behind WorkQueue	master	asterix-opt	Status: MERGED	+2	+1

Activity

People

Assignee:: Mihir Kamdar (Inactive)

Reporter:: Mihir Kamdar (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 15/Oct/17 6:31 AM

Updated:: 02/Nov/18 5:19 PM

Resolved:: 15/Oct/17 1:47 PM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-26400: guard access of NodeControllerState behind WorkQueue: Gerrit Review:

Rebalance out of an analytics node after running lots of queries fails

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty