Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62765

[System Test] MERGE operation.afterFinalize failed error seen

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • Columnar 1.0.0
    • Columnar 1.0.0
    • analytics
    • 1.0.0-2209
    • Untriaged
    • 0
    • Unknown
    • Analytics Sprint 46

    Description

      Workload -

      Type Number of collections Number of items in millions Total count in millions
      Remote 240 75 12000**
      Standalone 50 8 4000*
      Kafka 5 33.5 165

      *Some standalone collections have 8 mil and some have multiples of 8 million items. The total doc count is 4000 million ( 4 billion) items.
      Number of links = 6 ( 2 remote + 2 external + 2 kafka). 1 remote link and 1 kafka link is active.

      **There were around 80 empty collections. The test had incorrectly picked up an empty keyspace to form a remote collection against, so a set of 80 collections are empty.

      Additionally, because of https://issues.couchbase.com/browse/MB-62683 , I had to follow this workaround of setting remoteLinkRefreshAuthSeconds to 0 to speed up the ingestion process during second cycle.

      Seen on 018 -

      2024-07-16T15:53:19.637+00:00 INFO CBAS.messaging.CCMessageBroker [Executor-473:ClusterController] Received message: (linkZcZKRcJl/default1)[12]:BO:ActivePartitionMessage-RUNTIME_DEREGISTERED()
      2024-07-16T15:53:56.461+00:00 ERRO CBAS.impls.LSMHarness [Executor-50328:b465a35d7e24c35c5018a2ed1b8d2ba6] MERGE operation.afterFinalize failed on {"dir" : "/var/cb-cache/@analytics/v_iodevice_12/storage/partition_60/Database1qMFXijmY/scope1FJjSomvO/remotedatasetoIDxWolm/0/remotedatasetoIDxWolm", "memory" : [{"state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[261,261]", "index":{"class": "BTree", "file": "storage/partition_60/Database1qMFXijmY/scope1FJjSomvO/remotedatasetoIDxWolm/0/remotedatasetoIDxWolm_virtual_0"}}, {"state":"INACTIVE", "writers":0, "readers":0, "pendingFlushes":0, "id":"[262,262]", "index":{"class": "BTree", "file": "storage/partition_60/Database1qMFXijmY/scope1FJjSomvO/remotedatasetoIDxWolm/0/remotedatasetoIDxWolm_virtual_1"}}], "disk" : 6, "num-scheduled-flushes":0, "current-memory-component":0}
      software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 503, Request ID: 2DE10D6FC777F797, Extended Request ID: lVWgsiEes7yCnLFIpwUPpLsUcOF7zJJfP3d7VkqeRpMa2LJpHgELo+BF9GI/D8AXpQXcpGyzMmomYYjmdQYZ7PpMHRD9IuXwGQirTdMwVFSdO0+AjEZrCrbYIdCMigdn)
      	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156) ~[aws-xml-protocol-2.24.9.jar:?]
      	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108) ~[aws-xml-protocol-2.24.9.jar:?]
      	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85) ~[aws-xml-protocol-2.24.9.jar:?]
      	at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43) ~[aws-xml-protocol-2.24.9.jar:?]
      	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:93) ~[aws-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:279) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:50) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:38) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:72) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:224) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:80) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:74) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) ~[sdk-core-2.24.9.jar:?]
      	at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:53) ~[aws-core-2.24.9.jar:?]
      	at software.amazon.awssdk.services.s3.DefaultS3Client.deleteObjects(DefaultS3Client.java:3127) ~[s3-2.24.9.jar:?]
      	at org.apache.asterix.cloud.clients.aws.s3.S3CloudClient.deleteObjects(S3CloudClient.java:231) ~[asterix-cloud.jar:1.0.0-2209]
      	at org.apache.asterix.cloud.lazy.accessor.AbstractLazyAccessor.doCloudDelete(AbstractLazyAccessor.java:61) ~[asterix-cloud.jar:1.0.0-2209]
      	at org.apache.asterix.cloud.lazy.accessor.ReplaceableCloudAccessor.doDelete(ReplaceableCloudAccessor.java:121) ~[asterix-cloud.jar:1.0.0-2209]
      	at org.apache.asterix.cloud.LazyCloudIOManager.lambda$delete$6(LazyCloudIOManager.java:198) ~[asterix-cloud.jar:1.0.0-2209]
      	at org.apache.hyracks.cloud.util.CloudRetryableRequestUtil.doRun(CloudRetryableRequestUtil.java:184) ~[hyracks-cloud.jar:1.0.0-2209]
      	at org.apache.hyracks.cloud.util.CloudRetryableRequestUtil.run(CloudRetryableRequestUtil.java:74) ~[hyracks-cloud.jar:1.0.0-2209]
      	at org.apache.hyracks.cloud.util.CloudRetryableRequestUtil.run(CloudRetryableRequestUtil.java:59) ~[hyracks-cloud.jar:1.0.0-2209]
      	at org.apache.asterix.cloud.LazyCloudIOManager.lambda$delete$7(LazyCloudIOManager.java:198) ~[asterix-cloud.jar:1.0.0-2209]
      	at org.apache.hyracks.cloud.util.CloudRetryableRequestUtil.doRun(CloudRetryableRequestUtil.java:184) ~[hyracks-cloud.jar:1.0.0-2209]
      	at org.apache.hyracks.cloud.util.CloudRetryableRequestUtil.run(CloudRetryableRequestUtil.java:74) ~[hyracks-cloud.jar:1.0.0-2209]
      	at org.apache.hyracks.cloud.util.CloudRetryableRequestUtil.run(CloudRetryableRequestUtil.java:59) ~[hyracks-cloud.jar:1.0.0-2209]
      	at org.apache.asterix.cloud.LazyCloudIOManager.delete(LazyCloudIOManager.java:198) ~[asterix-cloud.jar:1.0.0-2209]
      	at org.apache.asterix.common.ioopcallbacks.LSMIOOperationCallback.afterFinalize(LSMIOOperationCallback.java:140) ~[asterix-common.jar:1.0.0-2209]
      	at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.doIo(LSMHarness.java:564) [hyracks-storage-am-lsm-common.jar:1.0.0-2209]
      	at org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.merge(LSMHarness.java:590) [hyracks-storage-am-lsm-common.jar:1.0.0-2209]
      	at org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.merge(LSMTreeIndexAccessor.java:128) [hyracks-storage-am-lsm-common.jar:1.0.0-2209]
      	at org.apache.hyracks.storage.am.lsm.common.impls.MergeOperation.call(MergeOperation.java:52) [hyracks-storage-am-lsm-common.jar:1.0.0-2209]
      	at org.apache.hyracks.storage.am.lsm.common.impls.MergeOperation.call(MergeOperation.java:33) [hyracks-storage-am-lsm-common.jar:1.0.0-2209]
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
      	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
      	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 1 failure: null (Service: S3, Status Code: 503, Request ID: A84D1445B1A85075, Extended Request ID: iO8GA5OMFiZsjAMoyugSQb8RthzOgHGjjEWtL3J8rJheCVSum6qUp8/OxP8ImQARscQmwvhZTitsSughCK/nHQl7Q1nCU5Dk6ajfOTEFboryb0CyQCvUBWF5q2gRWLiV)
      	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 2 failure: null (Service: S3, Status Code: 503, Request ID: 8D4AC1F3347A2CA, Extended Request ID: +s9nHxr39iq0DydNhyQeJIgoshAG6qfrl+f/JUoZPRdz6I/qeWvluMl1q/aybRcYN9bWf4MCxfc4ENlo4pDrzLrMqNEiGxxg+c9YY+RRhcASGYzOq89VKhcRJjB26N/J)
      	Suppressed: software.amazon.awssdk.core.exception.SdkClientException: Request attempt 3 failure: null (Service: S3, Status Code: 503, Request ID: 93377E164FFADAE9, Extended Request ID: 7kRFaEtQbPL4OZV1ZW39y3SNUbOq9xJX7kAvrtLQpCs0Gqhnvcmygn3VTXkDNWK2FmGu1Tsn5joYrUzcWAlNTs1Z9MaEOaw28t94ZZTVihW4uMtSKfqJixmda6Ob68go)
      2024-07-16T15:53:56.641+00:00 ERRO CBAS.nc.HaltCallback [Executor-50328:b465a35d7e24c35c5018a2ed1b8d2ba6] Operation {"fileName": "254_261_b", "ioOpID": 2142788851} has failed
      

      cbcollect ->

      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-16T161613-ns_1%40svc-da-node-018.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-16T161613-ns_1%40svc-da-node-023.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-16T161613-ns_1%40svc-da-node-025.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-16T161613-ns_1%40svc-da-node-030.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pavan.pb Pavan PB
            pavan.pb Pavan PB
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty