Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31356

[System Test] Seeing Fatal error "Restarting process to ensure data integrity" on one analytics node while cbas process on another was killed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • 6.0.0
    • 6.0.0
    • analytics
    • centos1 cluster

    Description

      Build : 6.0.0-1643
      Test: -test tests/integration/test_allFeatures_alice_timers.yml -scope tests/integration/scope_Xattrs_Alice.yml
      Scale: 2
      Iteration: 2nd

      In the longevity system test, there is a step to kill the cbas process on one of the analytics node. In the 2nd iteration of the test, while this step is performed on 172.23.96.168, seeing the following fatal error on another node 172.23.108.104.

      2018-09-16T01:28:44.775-07:00 FATA CBAS.runtime.DcpUpdateCallback [org.apache.hyracks.api.rewriter.runtime.SuperActivity:JID:4.68800:TAID:TID:ANID:ODID:2:0:1:0:0] Restarting process to ensure data integrity
      org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.InterruptedException
              at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:51) ~[hyracks-api.jar:6.0.0-1643]
              at org.apache.hyracks.control.nc.io.IoRequest.queue(IoRequest.java:105) ~[hyracks-control-nc.jar:6.0.0-1643]
              at org.apache.hyracks.control.nc.io.IoRequest.read(IoRequest.java:76) ~[hyracks-control-nc.jar:6.0.0-1643]
              at org.apache.hyracks.control.nc.io.IOManager.asyncRead(IOManager.java:318) ~[hyracks-control-nc.jar:6.0.0-1643]
              at org.apache.hyracks.control.nc.io.IOManager.syncRead(IOManager.java:249) ~[hyracks-control-nc.jar:6.0.0-1643]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.read(BufferCache.java:553) ~[hyracks-storage-common.jar:6.0.0-1643]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.tryRead(BufferCache.java:524) ~[hyracks-storage-common.jar:6.0.0-1643]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.pin(BufferCache.java:194) ~[hyracks-storage-common.jar:6.0.0-1643]
              at org.apache.hyracks.storage.am.btree.impls.DiskBTree.searchDown(DiskBTree.java:170) ~[hyracks-storage-am-btree.jar:6.0.0-1643]
              at org.apache.hyracks.storage.am.btree.impls.DiskBTree.search(DiskBTree.java:95) ~[hyracks-storage-am-btree.jar:6.0.0-1643]
              at org.apache.hyracks.storage.am.btree.impls.DiskBTree.access$000(DiskBTree.java:44) ~[hyracks-storage-am-btree.jar:6.0.0-1643]
              at org.apache.hyracks.storage.am.btree.impls.DiskBTree$DiskBTreeAccessor.search(DiskBTree.java:243) ~[hyracks-storage-am-btree.jar:6.0.0-1643]
              at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreePointSearchCursor.doHasNext(LSMBTreePointSearchCursor.java:82) ~[hyracks-storage-am-lsm-btree.jar:6.0.0-1643]
              at org.apache.hyracks.storage.common.EnforcedIndexCursor.hasNext(EnforcedIndexCursor.java:69) ~[hyracks-storage-common.jar:6.0.0-1643]
              at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeSearchCursor.doHasNext(LSMBTreeSearchCursor.java:60) ~[hyracks-storage-am-lsm-btree.jar:6.0.0-1643]
              at org.apache.hyracks.storage.common.EnforcedIndexCursor.hasNext(EnforcedIndexCursor.java:69) ~[hyracks-storage-common.jar:6.0.0-1643]
              at org.apache.asterix.runtime.operators.LSMPrimaryUpsertOperatorNodePushable$1.process(LSMPrimaryUpsertOperatorNodePushable.java:159) [asterix-runtime.jar:6.0.0-1643]
              at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.processFrame(LSMHarness.java:850) ~[hyracks-storage-am-lsm-common.jar:6.0.0-1643]
              at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.batchOperate(LSMHarness.java:701) [hyracks-storage-am-lsm-common.jar:6.0.0-1643]
              at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.batchOperate(LSMTreeIndexAccessor.java:214) [hyracks-storage-am-lsm-common.jar:6.0.0-1643]
              at org.apache.asterix.runtime.operators.LSMPrimaryUpsertOperatorNodePushable.nextFrame(LSMPrimaryUpsertOperatorNodePushable.java:323) [asterix-runtime.jar:6.0.0-1643]
              at org.apache.asterix.external.feed.dataflow.SyncFeedRuntimeInputHandler.nextFrame(SyncFeedRuntimeInputHandler.java:46) [asterix-external-data.jar:6.0.0-1643]
              at org.apache.asterix.external.operators.FeedMetaStoreNodePushable.nextFrame(FeedMetaStoreNodePushable.java:151) [asterix-external-data.jar:6.0.0-1643]
              at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:396) [hyracks-control-nc.jar:6.0.0-1643]
              at org.apache.hyracks.control.nc.Task.run(Task.java:330) [hyracks-control-nc.jar:6.0.0-1643]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
      Caused by: java.lang.InterruptedException
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) ~[?:1.8.0_181]
              at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) ~[?:1.8.0_181]
              at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:350) ~[?:1.8.0_181]
              at org.apache.hyracks.control.nc.io.IoRequest.queue(IoRequest.java:103) ~[hyracks-control-nc.jar:6.0.0-1643]
              ... 26 more
      2018-09-16T01:28:44.793-07:00 ERRO CBAS.executor.JobExecutor [Worker:ClusterController] Unexpected failure. Aborting job JID:4.68800
      org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node 1ee1017c4f09f59b4fd236d801c2d0ab does not exist
              at org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:57) ~[hyracks-api.jar:6.0.0-1643]
              at org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473) ~[hyracks-control-cc.jar:6.0.0-1643]
              at org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365) ~[hyracks-control-cc.jar:6.0.0-1643]
              at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245) ~[hyracks-control-cc.jar:6.0.0-1643]
              at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209) ~[hyracks-control-cc.jar:6.0.0-1643]
              at org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:732) [hyracks-control-cc.jar:6.0.0-1643]
      

      These failures do not have a visible functional impact, but it would be good to analyze the logs. Also, since this error was seen only once in 5 iterations, the bug is marked as Major.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty