Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62299

Merge operation failures, tcp io errors are seen while facing network issues between s3 and columnar.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Columnar 1.0.0
    • Columnar 1.0.0
    • analytics

    Description

      analytics_error.log:2024-06-12T11:18:24.023+00:00 ERRO CBAS.impls.LSMHarness [Executor-3065:4512b6284f0da58f6b891df87fa782cf] MERGE operation failed on {"class" : "LSMColumnBTree", "dir" : "/var/cb-cache/@analytics/v_iodevice_3/storage/partition_19/Default/Default/remote_BTKMm_volCollection_0_ufvdw/0/remote_BTKMm_volCollection_0_ufvdw", "memory" : [{"class":"LSMBTreeMemoryComponent", "state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[877,877]", "index":{"class":"BTree","file":"storage/partition_19/Default/Default/remote_BTKMm_volCollection_0_ufvdw/0/remote_BTKMm_volCollection_0_ufvdw_virtual_0"}}, {"class":"LSMBTreeMemoryComponent", "state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[876,876]", "index":{"class":"BTree","file":"storage/partition_19/Default/Default/remote_BTKMm_volCollection_0_ufvdw/0/remote_BTKMm_volCollection_0_ufvdw_virtual_1"}}], "disk" : 8, "num-scheduled-flushes":0, "current-memory-component":0}
      analytics_error.log:2024-06-12T11:18:24.126+00:00 ERRO CBAS.nc.HaltCallback [Executor-3065:4512b6284f0da58f6b891df87fa782cf] Operation org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeMergeOperation@4fa60f3 has failed
      analytics_error.log:2024-06-12T11:23:45.625+00:00 ERRO CBAS.tcp.TCPEndpoint [TCPEndpoint IO Thread [/0.0.0.0:9116]] Unexpected tcp io error in connection TCPConnection[Remote Address: svc-da-node-002.ooooxzltycppnjs4.sandbox.nonprod-project-avengers.com/10.0.3.198:9116 Local Address: /0.0.0.0:9116]
      analytics_error.log:2024-06-12T11:23:45.628+00:00 ERRO CBAS.tcp.TCPEndpoint [TCPEndpoint IO Thread [/0.0.0.0:9116]] Unexpected tcp io error in connection TCPConnection[Remote Address: /10.0.3.221:50798 Local Address: /0.0.0.0:9116]
      analytics_error.log:2024-06-12T11:23:45.828+00:00 ERRO CBAS.executor.JobExecutor [Worker:ClusterController] java.io.IOException: Connection failed to svc-da-node-002.ooooxzltycppnjs4.sandbox.nonprod-project-avengers.com/10.0.3.198:9115
      analytics_error.log:2024-06-12T11:23:45.937+00:00 ERRO CBAS.job.JobManager [Worker:ClusterController] Exception cleaning up joblet JID:0.15 on node 53dad37f77ec0b149de47aba7b301084
      analytics_error.log:2024-06-12T11:23:50.550+00:00 ERRO CBAS.executor.JobExecutor [Worker:ClusterController] Unexpected failure. Aborting job JID:0.15
      analytics_error.log:2024-06-12T11:26:17.034+00:00 ERRO CBAS.active.ActiveEntityEventsListener [ActiveNotificationHandler] ingestion job JID:0.15 failed
      analytics_error.log:2024-06-12T11:41:35.766+00:00 ERRO CBAS.tcp.TCPEndpoint [TCPEndpoint IO Thread [/0.0.0.0:9116]] Unexpected tcp io error in connection TCPConnection[Remote Address: svc-da-node-002.ooooxzltycppnjs4.sandbox.nonprod-project-avengers.com/10.0.3.198:9116 Local Address: /0.0.0.0:9116]
      

      2024-06-12T11:18:24.023+00:00 ERRO CBAS.impls.LSMHarness [Executor-3065:4512b6284f0da58f6b891df87fa782cf] MERGE operation failed on {"class" : "LSMColumnBTree", "dir" : "/var/cb-cache/@analytics/v_iodevice_3/storage/partition_19/Default/Default/remote_BTKMm_volCollection_0_ufvdw/0/remote_BTKMm_volCollection_0_ufvdw", "memory" : [{"class":"LSMBTreeMemoryComponent", "state":"READABLE_WRITABLE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[877,877]", "index":{"class":"BTree","file":"storage/partition_19/Default/Default/remote_BTKMm_volCollection_0_ufvdw/0/remote_BTKMm_volCollection_0_ufvdw_virtual_0"}}, {"class":"LSMBTreeMemoryComponent", "state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[876,876]", "index":{"class":"BTree","file":"storage/partition_19/Default/Default/remote_BTKMm_volCollection_0_ufvdw/0/remote_BTKMm_volCollection_0_ufvdw_virtual_1"}}], "disk" : 8, "num-scheduled-flushes":0, "current-memory-component":0}
      org.apache.hyracks.api.exceptions.HyracksDataException: java.net.SocketException: Connection reset
              at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:49) ~[hyracks-api.jar:1.0.0-2134]
              at org.apache.asterix.cloud.clients.aws.s3.S3CloudClient.read(S3CloudClient.java:138) ~[asterix-cloud.jar:1.0.0-2134]
              at org.apache.asterix.cloud.AbstractCloudIOManager.cloudRead(AbstractCloudIOManager.java:175) ~[asterix-cloud.jar:1.0.0-2134]
              at org.apache.hyracks.cloud.buffercache.context.DefaultCloudReadContext.readAndPersistIfEmpty(DefaultCloudReadContext.java:110) ~[hyracks-cloud.jar:1.0.0-2134]
              at org.apache.hyracks.cloud.buffercache.context.DefaultCloudReadContext.readAndPersistPage(DefaultCloudReadContext.java:82) ~[hyracks-cloud.jar:1.0.0-2134]
              at org.apache.hyracks.cloud.buffercache.context.DefaultCloudReadContext.processHeader(DefaultCloudReadContext.java:77) ~[hyracks-cloud.jar:1.0.0-2134]
              at org.apache.hyracks.storage.common.file.CompressedBufferedFileHandle.read(CompressedBufferedFileHandle.java:62) ~[hyracks-storage-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.read(BufferCache.java:571) ~[hyracks-storage-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.tryRead(BufferCache.java:544) ~[hyracks-storage-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.pin(BufferCache.java:214) ~[hyracks-storage-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.pin(BufferCache.java:176) ~[hyracks-storage-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.btree.column.impls.btree.ColumnBTreeRangeSearchCursor.pin(ColumnBTreeRangeSearchCursor.java:293) ~[hyracks-storage-am-lsm-btree-column.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.btree.column.impls.lsm.tuples.ColumnMultiBufferProvider.readNext(ColumnMultiBufferProvider.java:119) ~[hyracks-storage-am-lsm-btree-column.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.btree.column.impls.lsm.tuples.ColumnMultiBufferProvider.readAll(ColumnMultiBufferProvider.java:88) ~[hyracks-storage-am-lsm-btree-column.jar:1.0.0-2134]
              at org.apache.asterix.column.bytes.stream.in.MultiByteBufferInputStream.reset(MultiByteBufferInputStream.java:73) ~[asterix-column.jar:1.0.0-2134]
              at org.apache.asterix.column.tuple.MergeColumnTupleReference.startColumn(MergeColumnTupleReference.java:81) ~[asterix-column.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.btree.column.impls.lsm.tuples.AbstractColumnTupleReference.reset(AbstractColumnTupleReference.java:147) ~[hyracks-storage-am-lsm-btree-column.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.btree.column.impls.btree.ColumnBTreeRangeSearchCursor.setCursorPosition(ColumnBTreeRangeSearchCursor.java:159) ~[hyracks-storage-am-lsm-btree-column.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.btree.column.impls.btree.ColumnBTreeRangeSearchCursor.fetchNextLeafPage(ColumnBTreeRangeSearchCursor.java:97) ~[hyracks-storage-am-lsm-btree-column.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.btree.column.impls.btree.ColumnBTreeRangeSearchCursor.doHasNext(ColumnBTreeRangeSearchCursor.java:109) ~[hyracks-storage-am-lsm-btree-column.jar:1.0.0-2134]
              at org.apache.hyracks.storage.common.EnforcedIndexCursor.hasNext(EnforcedIndexCursor.java:69) ~[hyracks-storage-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.common.impls.LSMIndexSearchCursor.pushIntoQueueFromCursorAndReplaceThisElement(LSMIndexSearchCursor.java:194) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeRangeSearchCursor.pushOutputElementIntoQueueIfNeeded(LSMBTreeRangeSearchCursor.java:215) ~[hyracks-storage-am-lsm-btree.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeRangeSearchCursor.checkPriorityQueue(LSMBTreeRangeSearchCursor.java:189) ~[hyracks-storage-am-lsm-btree.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.common.impls.LSMIndexSearchCursor.doHasNext(LSMIndexSearchCursor.java:144) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.common.EnforcedIndexCursor.hasNext(EnforcedIndexCursor.java:69) ~[hyracks-storage-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTree.doMerge(LSMBTree.java:330) ~[hyracks-storage-am-lsm-btree.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.merge(AbstractLSMIndex.java:917) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.doIo(LSMHarness.java:566) [hyracks-storage-am-lsm-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.merge(LSMHarness.java:608) [hyracks-storage-am-lsm-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.merge(LSMTreeIndexAccessor.java:128) [hyracks-storage-am-lsm-common.jar:1.0.0-2134]        at org.apache.hyracks.storage.am.lsm.common.impls.MergeOperation.call(MergeOperation.java:52) [hyracks-storage-am-lsm-common.jar:1.0.0-2134]
              at org.apache.hyracks.storage.am.lsm.common.impls.MergeOperation.call(MergeOperation.java:33) [hyracks-storage-am-lsm-common.jar:1.0.0-2134]
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
              at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
      Caused by: java.net.SocketException: Connection reset
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty