Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62964

Internal error is seen from turning on the cluster and restoring the backup on 60TB data.

    XMLWordPrintable

Details

    Description

      1. Create a 32 node columnar cluster. Ingest 1B items per remote collection in 20 collections.
      2. 20TB in columnar. Disconnect previous link and create new link and 20 more collections.
      3. Start scaling operations from 32 -> 16 -> 8 -> 4 -> 2 -> 4 -> 8 -> 16 -> 32
      4. 40TB in columnar. Disconnect previous link and create new link and 20 more collections.
      5. Start scaling operations from 32 -> 16 -> 8 -> 4 -> 2 -> 4 -> 8 -> 16 -> 32
      6. 60TB in columnar. Trigger the backup. Backup completed.
      7. Scale down the cluster to 2 nodes and turn off the cluster.
      8. After 5 days. Turn on the cluster. By now the s3 bucket is clean up because of DeleteAllObjects is set to True. All the metadata is cleared up but we have the backup
      9. Trigger restore. All the links, collections etc came back and the cluster looks fine.
      10. On running the queries on some of the collections on disconnect links Internal Error is observed

       
      2024-07-31T04:55:56.237+00:00 INFO CBAS.server.QueryServiceServlet [HttpExecutor(port:18095)-11] handleRequest: uuid=94cc4ebe-929f-42cf-a87c-210ac5839894, clientContextID=40c8b218-f563-438d-b210-efcc4df918e6, {"host":"18.207.123.87:18091","path":"/query/service","statement":"<ud>select * from remote_RNr9p_volCollection_0_ftbsa limit 1;</ud>","pretty":false,"mode":"immediate","clientContextID":"40c8b218-f563-438d-b210-efcc4df918e6","clientType":"ASTERIX","dataverse":null,"format":"CLEAN_JSON","timeout":9223372036854775807,"maxResultReads":1,"planFormat":"JSON","expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":true,"job":false,"profile":"counts","signature":true,"multiStatement":true,"parseOnly":false,"readOnly":false,"maxWarnings":10,"sqlCompat":false,"source":"query_editor","scanConsistency":"not_bounded","scanWait":null}
      2024-07-31T04:55:56.262+00:00 INFO CBAS.work.AbortTasksWork [Worker:b54d6517ca617b084aa3d3d1f9fa6ad2] Aborting Tasks: JID:47.8:[TAID:TID:ANID:ODID:4:0:7:0, TAID:TID:ANID:ODID:4:0:8:0, TAID:TID:ANID:ODID:4:0:9:0, TAID:TID:ANID:ODID:4:0:10:0, TAID:TID:ANID:ODID:4:0:12:0, TAID:TID:ANID:ODID:4:0:13:0, TAID:TID:ANID:ODID:4:0:14:0, TAID:TID:ANID:ODID:4:0:15:0]
      2024-07-31T04:55:56.452+00:00 WARN CBAS.util.CloudRetryableRequestUtil [SAO:JID:47.8:TAID:TID:ANID:ODID:4:0:13:0] Ignored interrupting ICloudReturnableRequest
      software.amazon.awssdk.core.exception.AbortedException: Thread was interrupted
              at software.amazon.awssdk.core.exception.AbortedException$BuilderImpl.build(AbortedException.java:93) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.exception.AbortedException.create(AbortedException.java:38) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.handleInterruptedException(ApiCallAttemptTimeoutTrackingStage.java:141) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.translatePipelineException(ApiCallAttemptTimeoutTrackingStage.java:105) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:89) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:55) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:39) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:50) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:32) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:224) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:173) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$0(BaseSyncClientHandler.java:66) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:182) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:60) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:52) ~[sdk-core-2.24.9.jar:?]
              at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:60) ~[aws-core-2.24.9.jar:?]
              at software.amazon.awssdk.services.s3.DefaultS3Client.getObject(DefaultS3Client.java:5174) ~[s3-2.24.9.jar:?]
              at software.amazon.awssdk.services.s3.S3Client.getObject(S3Client.java:9005) ~[s3-2.24.9.jar:?]
              at org.apache.asterix.cloud.clients.aws.s3.S3CloudClient.read(S3CloudClient.java:140) ~[asterix-cloud.jar:1.0.0-2237]
              at org.apache.asterix.cloud.AbstractCloudIOManager.lambda$cloudRead$0(AbstractCloudIOManager.java:187) ~[asterix-cloud.jar:1.0.0-2237]
              at org.apache.hyracks.cloud.util.CloudRetryableRequestUtil.doRun(CloudRetryableRequestUtil.java:184) ~[hyracks-cloud.jar:1.0.0-2237]
              at org.apache.hyracks.cloud.util.CloudRetryableRequestUtil.run(CloudRetryableRequestUtil.java:74) ~[hyracks-cloud.jar:1.0.0-2237]
              at org.apache.asterix.cloud.AbstractCloudIOManager.cloudRead(AbstractCloudIOManager.java:189) ~[asterix-cloud.jar:1.0.0-2237]
              at org.apache.hyracks.cloud.buffercache.context.DefaultCloudReadContext.readAndPersistIfEmpty(DefaultCloudReadContext.java:114) ~[hyracks-cloud.jar:1.0.0-2237]
              at org.apache.hyracks.cloud.buffercache.context.DefaultCloudReadContext.readAndPersistPage(DefaultCloudReadContext.java:85) ~[hyracks-cloud.jar:1.0.0-2237]
              at org.apache.hyracks.cloud.buffercache.context.DefaultCloudReadContext.processHeader(DefaultCloudReadContext.java:78) ~[hyracks-cloud.jar:1.0.0-2237]
              at org.apache.hyracks.storage.common.file.CompressedBufferedFileHandle.read(CompressedBufferedFileHandle.java:64) ~[hyracks-storage-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.read(BufferCache.java:571) ~[hyracks-storage-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.tryRead(BufferCache.java:541) ~[hyracks-storage-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.pin(BufferCache.java:215) ~[hyracks-storage-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.common.buffercache.BufferCache.pin(BufferCache.java:177) ~[hyracks-storage-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.common.freepage.AppendOnlyLinkedMetadataPageManager.getRootPageId(AppendOnlyLinkedMetadataPageManager.java:287) ~[hyracks-storage-am-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.common.impls.AbstractTreeIndex.setRootPage(AbstractTreeIndex.java:104) ~[hyracks-storage-am-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.common.impls.AbstractTreeIndex.activate(AbstractTreeIndex.java:120) ~[hyracks-storage-am-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndexFileManager.isValidTreeIndex(AbstractLSMIndexFileManager.java:111) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndexFileManager.cleanupAndGetValidFilesInternal(AbstractLSMIndexFileManager.java:148) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeFileManager.cleanupAndGetValidFiles(LSMBTreeFileManager.java:88) ~[hyracks-storage-am-lsm-btree.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.loadDiskComponents(AbstractLSMIndex.java:208) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.activate(AbstractLSMIndex.java:202) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.lsm.btree.column.impls.lsm.LSMColumnBTree.activate(LSMColumnBTree.java:89) ~[hyracks-storage-am-lsm-btree-column.jar:1.0.0-2237]
              at org.apache.asterix.common.context.DatasetLifecycleManager.open(DatasetLifecycleManager.java:221) ~[asterix-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.common.dataflow.IndexDataflowHelper.open(IndexDataflowHelper.java:68) ~[hyracks-storage-am-common.jar:1.0.0-2237]
              at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.open(IndexSearchOperatorNodePushable.java:205) ~[hyracks-storage-am-common.jar:1.0.0-2237]
              at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputPushRuntime.open(AbstractOneInputPushRuntime.java:41) ~[algebricks-runtime.jar:1.0.0-2237]
              at org.apache.hyracks.algebricks.runtime.operators.std.EmptyTupleSourceRuntimeFactory$1.open(EmptyTupleSourceRuntimeFactory.java:51) ~[algebricks-runtime.jar:1.0.0-2237]
              at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$SourcePushRuntime.initialize(AlgebricksMetaOperatorDescriptor.java:175) ~[algebricks-runtime.jar:1.0.0-2237]
              at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$0(SuperActivityOperatorNodePushable.java:233) ~[hyracks-api.jar:1.0.0-2237]
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
              at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
      Caused by: software.amazon.awssdk.core.exception.SdkInterruptedException
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty