Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Columnar 1.0.0
-
1.0.0-2203-columnar
-
Untriaged
-
Linux x86_64
-
0
-
Unknown
-
Analytics Sprint 46
Description
Test Steps
- Load 25m documents into Mongo collection.
- Create a Confluent kafka topic against Mongo collection created in previous step.
- Deploy a 4 node Columnar cluster with 8vCPUs and 64GB memory node configuration.
- Create pair of Kafka links and 5 collections on each of the 2 Kafka links.
- Connect 1 of the 2 Kafka links.
- While data ingestion is happening from Kafka source, perform multiple scale up and scale down operations on the cluster.
- Start a continuous query workload on this cluster while data ingestion is ongoing.
Observation
All queries against kafka_collection1 are failing with internal error.
SELECT count(*) from kafka_collection1
|
[
|
{
|
"code": 25000, |
"msg": "Internal error" |
}
|
]
|
Queries are working fine for all other collections present in this cluster.
analytics_info.log on node-005
We can see that query failures are due to s3 exception.
2024-07-12T09:03:27.364+00:00 ERRO CBAS.rebalance.TopologyMonitor [Topology Monitor] failed to perform storage cleanup; keep partitions [0, -1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31] |
org.apache.hyracks.api.exceptions.HyracksDataException: java.util.concurrent.ExecutionException: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist. (Service: S3, Status Code: 404, Request ID: QB80ZTXS8NPZE403, Extended Request ID: eeQo0IyGkdm6paHMt6RP1yivzdZKOpiTjzfVmZnoboZjNTh4+RJeGBzCJP/GSe9+oeUl2kV8rnwlwZGbUiwcug==) |
at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:49) ~[hyracks-api.jar:1.0.0-2203] |
at org.apache.asterix.cloud.clients.aws.s3.S3ParallelDownloader.downloadFiles(S3ParallelDownloader.java:78) ~[asterix-cloud.jar:1.0.0-2203] |
at org.apache.asterix.cloud.LazyCloudIOManager.downloadMetadataFiles(LazyCloudIOManager.java:243) ~[asterix-cloud.jar:1.0.0-2203] |
at org.apache.asterix.cloud.LazyCloudIOManager.downloadPartitions(LazyCloudIOManager.java:132) ~[asterix-cloud.jar:1.0.0-2203] |
at org.apache.asterix.cloud.AbstractCloudIOManager.bootstrap(AbstractCloudIOManager.java:138) ~[asterix-cloud.jar:1.0.0-2203] |
at com.couchbase.analytics.control.rebalance.TopologyMonitor$TopologyMonitorThread.keepPartitions(TopologyMonitor.java:214) ~[columnar-server.jar:1.0.0-2203] |
at com.couchbase.analytics.control.rebalance.TopologyMonitor$TopologyMonitorThread.cleanupStorage(TopologyMonitor.java:195) [columnar-server.jar:1.0.0-2203] |
at com.couchbase.analytics.control.rebalance.TopologyMonitor$TopologyMonitorThread.ensureTopology(TopologyMonitor.java:154) [columnar-server.jar:1.0.0-2203] |
at com.couchbase.analytics.control.rebalance.TopologyMonitor$TopologyMonitorThread.run(TopologyMonitor.java:115) [columnar-server.jar:1.0.0-2203] |
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) [?:?] |
Caused by: java.util.concurrent.ExecutionException: software.amazon.awssdk.services.s3.model.NoSuchKeyException: The specified key does not exist. (Service: S3, Status Code: 404, Request ID: QB80ZTXS8NPZE403, Extended Request ID: eeQo0IyGkdm6paHMt6RP1yivzdZKOpiTjzfVmZnoboZjNTh4+RJeGBzCJP/GSe9+oeUl2kV8rnwlwZGbUiwcug==) |
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?] |
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[?:?] |
at org.apache.asterix.cloud.clients.aws.s3.S3ParallelDownloader.waitForFileDownloads(S3ParallelDownloader.java:132) ~[asterix-cloud.jar:1.0.0-2203] |
at org.apache.asterix.cloud.clients.aws.s3.S3ParallelDownloader.downloadFiles(S3ParallelDownloader.java:76) ~[asterix-cloud.jar:1.0.0-2203] |
... 12 more |
analytics_storage_debug.log on node-005
We are messaging regarding topic state not being found.
2024-07-12T09:03:32.826+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/0_33_b.dic |
2024-07-12T09:03:32.855+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/34_34_b.dic |
2024-07-12T09:03:32.880+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/35_35_b.dic |
2024-07-12T09:03:32.905+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/34_34_f |
2024-07-12T09:03:32.928+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/0_33_f |
2024-07-12T09:03:32.956+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_24/Default/Default/kafka_collection1/0/kafka_collection1/35_35_f |
2024-07-12T09:03:32.956+00:00 DEBU CBAS.flush.FlushColumnMetadata [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] Schema for record has changed: <ud>2a03da0d6587efc7ee3155c3e7e09756d0810161</ud> |
2024-07-12T09:03:32.956+00:00 DEBU CBAS.flush.FlushColumnMetadata [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] Schema for meta-record has changed: <ud>9cc7f0ce48c17ffa62e0b22317f9b1206d237162</ud> |
2024-07-12T09:03:32.956+00:00 DEBU CBAS.util.ComponentUtils [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] TopicState was NOT found |
2024-07-12T09:03:32.986+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/35_35_b.dic |
2024-07-12T09:03:33.011+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/34_34_b.dic |
2024-07-12T09:03:33.034+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/0_33_b.dic |
2024-07-12T09:03:33.092+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/34_34_f |
2024-07-12T09:03:33.115+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/35_35_f |
2024-07-12T09:03:33.142+00:00 DEBU CBAS.cloud.LazyCloudIOManager [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] DELETE storage/partition_25/Default/Default/kafka_collection1/0/kafka_collection1/0_33_f |
2024-07-12T09:03:33.142+00:00 DEBU CBAS.flush.FlushColumnMetadata [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] Schema for record has changed: <ud>2a03da0d6587efc7ee3155c3e7e09756d0810161</ud> |
2024-07-12T09:03:33.142+00:00 DEBU CBAS.flush.FlushColumnMetadata [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] Schema for meta-record has changed: <ud>9cc7f0ce48c17ffa62e0b22317f9b1206d237162</ud> |
2024-07-12T09:03:33.142+00:00 DEBU CBAS.util.ComponentUtils [TopicStatesRequest-(Default.kafka_link1.pkc-p11xm.us-east-1.aws.confluent.cloud:9092(CB))[-1]:TopicStateRequest-mini_volume.mini_volume.mini_volume_collection] TopicState was NOT found |