Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Columnar 1.0.0
-
1.0.0-2203-columnar
-
Untriaged
-
Linux x86_64
-
0
-
Unknown
-
Analytics Sprint 46
Description
Test Steps
- Deploy a 4 node Columnar instance with 16vCPUs and 64GB node configuration.
- Create pair of Kafka links and 50 collections on each of the 2 links against a kafka topic which has 1 million items.
- Connect the 1st Kafka link.
- Scale up the cluster from 4 to 8 nodes.
- Scale down the cluster from 8 to 4 nodes.
- Connect the 2nd Kafka link.
- Scale up the cluster from 4 to 8 nodes.
- Scale down the cluster from 8 to 4 nodes.
Observation
All Analytics queries are failing with internal error, Analytics Service is temporarily unavailable error.
SELECT count(*) from confluent_kafka_collection3;
|
[
|
{
|
"code": 25000, |
"msg": "Internal error" |
}
|
]
|
On node-002 in analytics_info.log
Analytics driver is failing to start due to following error.
failed to complete startup. HYR0087: Unequal number of trees and filters found in /var/cb-cache/@analytics/v_iodevice_10/storage/partition_42/Default/Default/confluent_kafka_collection78/0/confluent_kafka_collection78
2024-07-10T03:42:23.003+00:00 WARN CBAS.util.ExitUtil [JVM exit thread] JVM exiting with status 2; bye! |
java.lang.Throwable: exit callstack
|
at org.apache.hyracks.util.ExitUtil.exit(ExitUtil.java:92) ~[hyracks-util.jar:1.0.0-2203] |
at org.apache.asterix.app.replication.message.RegistrationTasksResponseMessage.handle(RegistrationTasksResponseMessage.java:90) ~[asterix-app.jar:1.0.0-2203] |
at org.apache.asterix.messaging.NCMessageBroker.lambda$receivedMessage$0(NCMessageBroker.java:108) ~[asterix-app.jar:1.0.0-2203] |
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) ~[?:?] |
2024-07-10T03:42:23.004+00:00 ERRO CBAS.replication.NcLifecycleCoordinator [Executor-11:ClusterController] Node 9569880d5aa20b2ab1a79563974b607b failed to complete startup |
org.apache.hyracks.api.exceptions.HyracksDataException: HYR0087: Unequal number of trees and filters found in /var/cb-cache/@analytics/v_iodevice_10/storage/partition_42/Default/Default/confluent_kafka_collection78/0/confluent_kafka_collection78 |
at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:57) ~[hyracks-api.jar:1.0.0-2203] |
at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeFileManager.cleanupAndGetValidFiles(LSMBTreeFileManager.java:104) ~[hyracks-storage-am-lsm-btree.jar:1.0.0-2203] |
at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.loadDiskComponents(AbstractLSMIndex.java:208) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2203] |
at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.activate(AbstractLSMIndex.java:202) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2203] |
at org.apache.hyracks.storage.am.lsm.btree.column.impls.lsm.LSMColumnBTree.activate(LSMColumnBTree.java:89) ~[hyracks-storage-am-lsm-btree-column.jar:1.0.0-2203] |
at org.apache.asterix.common.context.DatasetLifecycleManager.open(DatasetLifecycleManager.java:221) ~[asterix-common.jar:1.0.0-2203] |
at com.couchbase.analytics.bootstrap.AnalyticsLocalRecoveryManager.cleanUp(AnalyticsLocalRecoveryManager.java:95) ~[columnar-server.jar:1.0.0-2203] |
at com.couchbase.analytics.bootstrap.AnalyticsLocalRecoveryManager.startLocalRecovery(AnalyticsLocalRecoveryManager.java:58) ~[columnar-server.jar:1.0.0-2203] |
at org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:45) ~[asterix-app.jar:1.0.0-2203] |
at org.apache.asterix.app.replication.message.RegistrationTasksResponseMessage.handle(RegistrationTasksResponseMessage.java:63) ~[asterix-app.jar:1.0.0-2203] |
at org.apache.asterix.messaging.NCMessageBroker.lambda$receivedMessage$0(NCMessageBroker.java:108) ~[asterix-app.jar:1.0.0-2203] |
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) [?:?] |
Attachments
Issue Links
- is caused by
-
MB-62635 Revise node lifecycle order of operations
- Closed