Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
6.0.0
-
centos1 cluster
-
Untriaged
-
-
Unknown
-
CX Sprint 120
Description
Build : 6.0.0-1643
Test: -test tests/integration/test_allFeatures_alice_timers.yml -scope tests/integration/scope_Xattrs_Alice.yml
Scale: 2
Iteration: 2nd
In the longevity system test, there is a step to kill the cbas process on one of the analytics node. In the 2nd iteration of the test, while this step is performed on 172.23.96.168, seeing the following fatal error on another node 172.23.108.104.
2018-09-16T01:28:44.775-07:00 FATA CBAS.runtime.DcpUpdateCallback [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:4.68800:TAID:TID:ANID:ODID:2:0:1:0:0] Restarting process to ensure data integrity
|
org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.InterruptedException
|
at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:51) ~[hyracks-api.jar:6.0.0-1643]
|
at org.apache.hyracks.control.nc.io.IoRequest.queue(IoRequest.java:105) ~[hyracks-control-nc.jar:6.0.0-1643]
|
at org.apache.hyracks.control.nc.io.IoRequest.read(IoRequest.java:76) ~[hyracks-control-nc.jar:6.0.0-1643]
|
at org.apache.hyracks.control.nc.io.IOManager.asyncRead(IOManager.java:318) ~[hyracks-control-nc.jar:6.0.0-1643]
|
at org.apache.hyracks.control.nc.io.IOManager.syncRead(IOManager.java:249) ~[hyracks-control-nc.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.common.buffercache.BufferCache.read(BufferCache.java:553) ~[hyracks-storage-common.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.common.buffercache.BufferCache.tryRead(BufferCache.java:524) ~[hyracks-storage-common.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.common.buffercache.BufferCache.pin(BufferCache.java:194) ~[hyracks-storage-common.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.am.btree.impls.DiskBTree.searchDown(DiskBTree.java:170) ~[hyracks-storage-am-btree.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.am.btree.impls.DiskBTree.search(DiskBTree.java:95) ~[hyracks-storage-am-btree.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.am.btree.impls.DiskBTree.access$000(DiskBTree.java:44) ~[hyracks-storage-am-btree.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.am.btree.impls.DiskBTree$DiskBTreeAccessor.search(DiskBTree.java:243) ~[hyracks-storage-am-btree.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreePointSearchCursor.doHasNext(LSMBTreePointSearchCursor.java:82) ~[hyracks-storage-am-lsm-btree.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.common.EnforcedIndexCursor.hasNext(EnforcedIndexCursor.java:69) ~[hyracks-storage-common.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeSearchCursor.doHasNext(LSMBTreeSearchCursor.java:60) ~[hyracks-storage-am-lsm-btree.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.common.EnforcedIndexCursor.hasNext(EnforcedIndexCursor.java:69) ~[hyracks-storage-common.jar:6.0.0-1643]
|
at org.apache.asterix.runtime.operators.LSMPrimaryUpsertOperatorNodePushable$1.process(LSMPrimaryUpsertOperatorNodePushable.java:159) [asterix-runtime.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.processFrame(LSMHarness.java:850) ~[hyracks-storage-am-lsm-common.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.batchOperate(LSMHarness.java:701) [hyracks-storage-am-lsm-common.jar:6.0.0-1643]
|
at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.batchOperate(LSMTreeIndexAccessor.java:214) [hyracks-storage-am-lsm-common.jar:6.0.0-1643]
|
at org.apache.asterix.runtime.operators.LSMPrimaryUpsertOperatorNodePushable.nextFrame(LSMPrimaryUpsertOperatorNodePushable.java:323) [asterix-runtime.jar:6.0.0-1643]
|
at org.apache.asterix.external.feed.dataflow.SyncFeedRuntimeInputHandler.nextFrame(SyncFeedRuntimeInputHandler.java:46) [asterix-external-data.jar:6.0.0-1643]
|
at org.apache.asterix.external.operators.FeedMetaStoreNodePushable.nextFrame(FeedMetaStoreNodePushable.java:151) [asterix-external-data.jar:6.0.0-1643]
|
at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:396) [hyracks-control-nc.jar:6.0.0-1643]
|
at org.apache.hyracks.control.nc.Task.run(Task.java:330) [hyracks-control-nc.jar:6.0.0-1643]
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
|
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
|
Caused by: java.lang.InterruptedException
|
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) ~[?:1.8.0_181]
|
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) ~[?:1.8.0_181]
|
at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:350) ~[?:1.8.0_181]
|
at org.apache.hyracks.control.nc.io.IoRequest.queue(IoRequest.java:103) ~[hyracks-control-nc.jar:6.0.0-1643]
|
... 26 more
|
2018-09-16T01:28:44.793-07:00 ERRO CBAS.executor.JobExecutor [Worker:ClusterController] Unexpected failure. Aborting job JID:4.68800
|
org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node 1ee1017c4f09f59b4fd236d801c2d0ab does not exist
|
at org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:57) ~[hyracks-api.jar:6.0.0-1643]
|
at org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473) ~[hyracks-control-cc.jar:6.0.0-1643]
|
at org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365) ~[hyracks-control-cc.jar:6.0.0-1643]
|
at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245) ~[hyracks-control-cc.jar:6.0.0-1643]
|
at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209) ~[hyracks-control-cc.jar:6.0.0-1643]
|
at org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:732) [hyracks-control-cc.jar:6.0.0-1643]
|
These failures do not have a visible functional impact, but it would be good to analyze the logs. Also, since this error was seen only once in 5 iterations, the bug is marked as Major.