Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62670

[Mini Volume - BYOK] All queries are failing with internal error, Analytics Service is temporarily unavailable as service is failing to start due to HYR0087: Unequal number of trees and filters found

    XMLWordPrintable

Details

    • Untriaged
    • Linux x86_64
    • 0
    • Unknown
    • Analytics Sprint 46

    Description

      Test Steps

      1. Deploy a 4 node Columnar instance with 16vCPUs and 64GB node configuration.
      2. Create pair of Kafka links and 50 collections on each of the 2 links against a kafka topic which has 1 million items.
      3. Connect the 1st Kafka link.
      4. Scale up the cluster from 4 to 8 nodes.
      5. Scale down the cluster from 8 to 4 nodes.
      6. Connect the 2nd Kafka link.
      7. Scale up the cluster from 4 to 8 nodes.
      8. Scale down the cluster from 8 to 4 nodes.

      Observation

      All Analytics queries are failing with internal error, Analytics Service is temporarily unavailable error.

      SELECT count(*) from confluent_kafka_collection3;
      

      [
        {
          "code": 25000,
          "msg": "Internal error"
        }
      ]
      

      On node-002 in analytics_info.log

      Analytics driver is failing to start due to following error.
      failed to complete startup. HYR0087: Unequal number of trees and filters found in /var/cb-cache/@analytics/v_iodevice_10/storage/partition_42/Default/Default/confluent_kafka_collection78/0/confluent_kafka_collection78

      2024-07-10T03:42:23.003+00:00 WARN CBAS.util.ExitUtil [JVM exit thread] JVM exiting with status 2; bye!
      java.lang.Throwable: exit callstack
      	at org.apache.hyracks.util.ExitUtil.exit(ExitUtil.java:92) ~[hyracks-util.jar:1.0.0-2203]
      	at org.apache.asterix.app.replication.message.RegistrationTasksResponseMessage.handle(RegistrationTasksResponseMessage.java:90) ~[asterix-app.jar:1.0.0-2203]
      	at org.apache.asterix.messaging.NCMessageBroker.lambda$receivedMessage$0(NCMessageBroker.java:108) ~[asterix-app.jar:1.0.0-2203]
      	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
      	at java.base/java.lang.Thread.run(Thread.java:840) ~[?:?]

       

      2024-07-10T03:42:23.004+00:00 ERRO CBAS.replication.NcLifecycleCoordinator [Executor-11:ClusterController] Node 9569880d5aa20b2ab1a79563974b607b failed to complete startup
      org.apache.hyracks.api.exceptions.HyracksDataException: HYR0087: Unequal number of trees and filters found in /var/cb-cache/@analytics/v_iodevice_10/storage/partition_42/Default/Default/confluent_kafka_collection78/0/confluent_kafka_collection78
      	at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:57) ~[hyracks-api.jar:1.0.0-2203]
      	at org.apache.hyracks.storage.am.lsm.btree.impls.LSMBTreeFileManager.cleanupAndGetValidFiles(LSMBTreeFileManager.java:104) ~[hyracks-storage-am-lsm-btree.jar:1.0.0-2203]
      	at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.loadDiskComponents(AbstractLSMIndex.java:208) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2203]
      	at org.apache.hyracks.storage.am.lsm.common.impls.AbstractLSMIndex.activate(AbstractLSMIndex.java:202) ~[hyracks-storage-am-lsm-common.jar:1.0.0-2203]
      	at org.apache.hyracks.storage.am.lsm.btree.column.impls.lsm.LSMColumnBTree.activate(LSMColumnBTree.java:89) ~[hyracks-storage-am-lsm-btree-column.jar:1.0.0-2203]
      	at org.apache.asterix.common.context.DatasetLifecycleManager.open(DatasetLifecycleManager.java:221) ~[asterix-common.jar:1.0.0-2203]
      	at com.couchbase.analytics.bootstrap.AnalyticsLocalRecoveryManager.cleanUp(AnalyticsLocalRecoveryManager.java:95) ~[columnar-server.jar:1.0.0-2203]
      	at com.couchbase.analytics.bootstrap.AnalyticsLocalRecoveryManager.startLocalRecovery(AnalyticsLocalRecoveryManager.java:58) ~[columnar-server.jar:1.0.0-2203]
      	at org.apache.asterix.app.nc.task.LocalRecoveryTask.perform(LocalRecoveryTask.java:45) ~[asterix-app.jar:1.0.0-2203]
      	at org.apache.asterix.app.replication.message.RegistrationTasksResponseMessage.handle(RegistrationTasksResponseMessage.java:63) ~[asterix-app.jar:1.0.0-2203]
      	at org.apache.asterix.messaging.NCMessageBroker.lambda$receivedMessage$0(NCMessageBroker.java:108) ~[asterix-app.jar:1.0.0-2203]
      	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
      	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              michael.blow Michael Blow
              sujay.gad Sujay Gad
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty