Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-29942

Analytics System Test: All queries started timing out suddenly. Exception in logs: org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.InterruptedException

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.0.0
    • 5.5.0
    • analytics
    • 5.5.0-2814

    Description

      1. We have 2 analytics nodes and 2 KV nodes in the cluster.
      2. Loading 600 items per second in 4 CB buckets with expiry set as 3600s. With these settings there won't by more that 1.5M items in any bucket at any given point of time once doc count reach to that lever gradually.
      3. 240 queries are shot on the analytics nodes and ensured that there will always be 240 queries running/queued on analytics.
      4. After couple of days of this run all the queries started timing out after 300s which were passing earlier.
      Lots of HYracks exception can be seen in the logs.
      5. Flush operation also performed suspecting that queries might be scanning a lot of data and hence timing out but after flushing also none os the query were succeeding.

       
      2018-06-03T09:16:49.008-07:00 WARN CBAS.buffercache.BufferCache [Executor-82328:20b52be6564d5ba1fdc5a30027a20449] Failure while trying to read a page from disk
      org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.InterruptedException
              at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:51) ~[hyracks-api.jar:5.5.0-2814]
              at org.apache.hyracks.control.nc.io.IoRequest.queue(IoRequest.java:105) ~[hyracks-control-nc.jar:5.5.0-2814]
              at org.apache.hyracks.control.nc.io.IoRequest.read(IoRequest.java:76) ~[hyracks-control-nc.jar:5.5.0-2814]
              at org.apache.hyracks.control.nc.io.IOManager.asyncRead(IOManager.java:317) ~[hyracks-control-nc.jar:5.5.0-2814]
              at org.apache.hyracks.control.nc.io.IOManager.syncRead(IOManager.java:248) ~[hyracks-control-nc.jar:5.5.0-2814]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.read(BufferCache.java:548) ~[hyracks-storage-common.jar:5.5.0-2814]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.tryRead(BufferCache.java:519) ~[hyracks-storage-common.jar:5.5.0-2814]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.pin(BufferCache.java:190) ~[hyracks-storage-common.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.btree.impls.DiskBTreeRangeSearchCursor.acquirePage(DiskBTreeRangeSearchCursor.java:98) ~[hyracks-storage-am-btree.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.btree.impls.BTreeRangeSearchCursor.fetchNextLeafPage(BTreeRangeSearchCursor.java:104) ~[hyracks-storage-am-btree.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.btree.impls.DiskBTreeRangeSearchCursor.doHasNext(DiskBTreeRangeSearchCursor.java:45) ~[hyracks-storage-am-btree.jar:5.5.0-2814]
              at org.apache.hyracks.storage.common.EnforcedIndexCursor.hasNext(EnforcedIndexCursor.java:69) ~[hyracks-storage-common.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.lsm.common.impls.LSMIndexSearchCursor.pushIntoQueueFromCursorAndReplaceThisElement(LSMIndexSearchCursor.java:189) ~[hyracks-storage-am-lsm-common.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeRangeSearchCursor.pushOutputElementIntoQueueIfNeeded(LSMBTreeRangeSearchCursor.java:198) ~[hyracks-storage-am-lsm-btree.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeRangeSearchCursor.checkPriorityQueue(LSMBTreeRangeSearchCursor.java:180) ~[hyracks-storage-am-lsm-btree.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.lsm.common.impls.LSMIndexSearchCursor.doHasNext(LSMIndexSearchCursor.java:139) ~[hyracks-storage-am-lsm-common.jar:5.5.0-2814]
              at org.apache.hyracks.storage.common.EnforcedIndexCursor.hasNext(EnforcedIndexCursor.java:69) ~[hyracks-storage-common.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeSearchCursor.doHasNext(LSMBTreeSearchCursor.java:60) ~[hyracks-storage-am-lsm-btree.jar:5.5.0-2814]
              at org.apache.hyracks.storage.common.EnforcedIndexCursor.hasNext(EnforcedIndexCursor.java:69) ~[hyracks-storage-common.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.writeSearchResults(IndexSearchOperatorNodePushable.java:203) ~[hyracks-storage-am-common.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.nextFrame(IndexSearchOperatorNodePushable.java:244) ~[hyracks-storage-am-common.jar:5.5.0-2814]
              at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.nextFrame(IndexSearchOperatorNodePushable.java:244) ~[hyracks-storage-am-common.jar:5.5.0-2814]
              at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:93) ~[hyracks-dataflow-common.jar:5.5.0-2814]
              at org.apache.hyracks.algebricks.runtime.operators.std.EmptyTupleSourceRuntimeFactory$1.open(EmptyTupleSourceRuntimeFactory.java:55) ~[algebricks-runtime.jar:5.5.0-2814]
              at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$SourcePushRuntime.initialize(AlgebricksMetaOperatorDescriptor.java:105) ~[algebricks-runtime.jar:5.5.0-2814]
              at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$0(SuperActivityOperatorNodePushable.java:208) ~[hyracks-api.jar:5.5.0-2814]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_162]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_162]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_162]
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
      Caused by: java.lang.InterruptedException
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) ~[?:1.8.0_162]
              at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) ~[?:1.8.0_162]
              at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:350) ~[?:1.8.0_162]
              at org.apache.hyracks.control.nc.io.IoRequest.queue(IoRequest.java:103) ~[hyracks-control-nc.jar:5.5.0-2814]
              ... 27 more
      2018-06-03T09:16:49.639-07:00 INFO CBAS.dataset.DatasetPartitionWriter [Executor-82328:20b52be6564d5ba1fdc5a30027a20449] close(0)
      2018-06-03T09:16:49.639-07:00 WARN CBAS.nc.Task [org.apache.hyracks.api.rewriter.runtime.SuperActivity:TAID:TID:ANID:ODID:3:0:0:0:0] Task TAID:TID:ANID:ODID:3:0:0:0 failed with exception
      

      6. Again the doc count reach to 1.5M in all buckets and during that phase queries again started failing.
      7. All buckets flushed again and queries are now passing.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ritesh.agarwal Ritesh Agarwal
            ritesh.agarwal Ritesh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty