Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62707

[Mini Volume - BYOK] Select query against a kafka collection is failing with internal error due to s3 exception

    XMLWordPrintable

Details

    • Untriaged
    • Linux x86_64
    • 0
    • Unknown
    • Analytics Sprint 46

    Description

       Test Steps

      1. Load 25m documents into Mongo collection.
      2. Create a Confluent kafka topic against Mongo collection created in previous step.
      3. Deploy a 4 node Columnar cluster with 8vCPUs and 64GB memory node configuration.
      4. Create pair of Kafka links and 5 collections on each of the 2 Kafka links.
      5. Connect 1 of the 2 Kafka links.
      6. While data ingestion is happening from Kafka source, perform multiple scale up and scale down operations on the cluster.
      7. Start a continuous query workload on this cluster while data ingestion is ongoing.

      Observation

      All queries against kafka_collection1 are failing with internal error.

      SELECT count(*) from kafka_collection1
      

      [
        {
          "code": 25000,
          "msg": "Internal error"
        }
      ]
      

      Queries are working fine for all other collections present in this cluster.

      analytics_info.log on node-005
      We can see that query failures are due to s3 exception.

      2024-07-12T09:03:27.364+00:00 ERRO CBAS.rebalance.TopologyMonitor [Topology Monitor] failed to perform storage cleanup; keep partitions [0, -1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
      org.apache.hyracks.api.exceptions.HyracksDataException: java.util.concurrent.ExecutionException: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist. (Service: S3, Status Code: 404, Request ID: QB80ZTXS8NPZE403, Extended Request ID: eeQo0IyGkdm6paHMt6RP1yivzdZKOpiTjzfVmZnoboZjNTh4+RJeGBzCJP/GSe9+oeUl2kV8rnwlwZGbUiwcug==)
      	at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:49) ~[hyracks-api.jar:1.0.0-2203]
      	at org.apache.asterix.cloud.clients.aws.s3.S3ParallelDownloader.downloadFiles(S3ParallelDownloader.java:78) ~[asterix-cloud.jar:1.0.0-2203]
      	at org.apache.asterix.cloud.LazyCloudIOManager.downloadMetadataFiles(LazyCloudIOManager.java:243) ~[asterix-cloud.jar:1.0.0-2203]
      	at org.apache.asterix.cloud.LazyCloudIOManager.downloadPartitions(LazyCloudIOManager.java:132) ~[asterix-cloud.jar:1.0.0-2203]
      	at org.apache.asterix.cloud.AbstractCloudIOManager.bootstrap(AbstractCloudIOManager.java:138) ~[asterix-cloud.jar:1.0.0-2203]
      	at com.couchbase.analytics.control.rebalance.TopologyMonitor$TopologyMonitorThread.keepPartitions(TopologyMonitor.java:214) ~[columnar-server.jar:1.0.0-2203]
      	at com.couchbase.analytics.control.rebalance.TopologyMonitor$TopologyMonitorThread.cleanupStorage(TopologyMonitor.java:195) [columnar-server.jar:1.0.0-2203]
      	at com.couchbase.analytics.control.rebalance.TopologyMonitor$TopologyMonitorThread.ensureTopology(TopologyMonitor.java:154) [columnar-server.jar:1.0.0-2203]
      	at com.couchbase.analytics.control.rebalance.TopologyMonitor$TopologyMonitorThread.run(TopologyMonitor.java:115) [columnar-server.jar:1.0.0-2203]
      	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
      	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
      Caused by: java.util.concurrent.ExecutionException: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist. (Service: S3, Status Code: 404, Request ID: QB80ZTXS8NPZE403, Extended Request ID: eeQo0IyGkdm6paHMt6RP1yivzdZKOpiTjzfVmZnoboZjNTh4+RJeGBzCJP/GSe9+oeUl2kV8rnwlwZGbUiwcug==)
      	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?]
      	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[?:?]
      	at org.apache.asterix.cloud.clients.aws.s3.S3ParallelDownloader.waitForFileDownloads(S3ParallelDownloader.java:132) ~[asterix-cloud.jar:1.0.0-2203]
      	at org.apache.asterix.cloud.clients.aws.s3.S3ParallelDownloader.downloadFiles(S3ParallelDownloader.java:76) ~[asterix-cloud.jar:1.0.0-2203]
      	... 12 more
      

      analytics_storage_debug.log on node-005
      We are messaging regarding topic state not being found.

      2024-07-12T09:03:32.826+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/0_33_b.dic
      2024-07-12T09:03:32.855+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/34_34_b.dic
      2024-07-12T09:03:32.880+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/35_35_b.dic
      2024-07-12T09:03:32.905+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/34_34_f
      2024-07-12T09:03:32.928+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/0_33_f
      2024-07-12T09:03:32.956+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/35_35_f
      2024-07-12T09:03:32.956+00:00 DEBU CBAS.flush.FlushColumnMetadata [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] Schema for record has changed: <ud>2a03da0d6587efc7ee3155c3e7e09756d0810161</ud>
      2024-07-12T09:03:32.956+00:00 DEBU CBAS.flush.FlushColumnMetadata [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] Schema for meta-record has changed: <ud>9cc7f0ce48c17ffa62e0b22317f9b1206d237162</ud>
      2024-07-12T09:03:32.956+00:00 DEBU CBAS.util.ComponentUtils [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] TopicState was NOT found
      2024-07-12T09:03:32.986+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/35_35_b.dic
      2024-07-12T09:03:33.011+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/34_34_b.dic
      2024-07-12T09:03:33.034+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/0_33_b.dic
      2024-07-12T09:03:33.092+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/34_34_f
      2024-07-12T09:03:33.115+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/35_35_f
      2024-07-12T09:03:33.142+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/0_33_f
      2024-07-12T09:03:33.142+00:00 DEBU CBAS.flush.FlushColumnMetadata [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] Schema for record has changed: <ud>2a03da0d6587efc7ee3155c3e7e09756d0810161</ud>
      2024-07-12T09:03:33.142+00:00 DEBU CBAS.flush.FlushColumnMetadata [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] Schema for meta-record has changed: <ud>9cc7f0ce48c17ffa62e0b22317f9b1206d237162</ud>
      2024-07-12T09:03:33.142+00:00 DEBU CBAS.util.ComponentUtils [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] TopicState was NOT found
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sujay.gad Sujay Gad
            sujay.gad Sujay Gad
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty