Details
-
Bug
-
Resolution: Not a Bug
-
Critical
-
Columnar 1.0.0
-
1.0.0-2239
-
Untriaged
-
0
-
Unknown
-
Analytics Sprint 47
Description
During the system test run, there was an automatic teardown of the remote cluster, but the Columnar cluster was still alive. After this, we keep seeing a bunch of errors and lots of queries have failed with internal errors.
2024-07-28T18:46:57.308+00:00 WARN CBAS.bootstrap.ClusterMonitor [RecoveryTask (linkuTqJNHAv/default1)] Remote bucket map failed to bootstrap in 60s; restarting monitor |
2024-07-28T18:46:57.308+00:00 INFO CBAS.bootstrap.ClusterMonitor [linkuTqJNHAv cluster monitor] interrupted during pool stream; will retry |
java.lang.InterruptedException: sleep interrupted
|
at java.base/java.lang.Thread.sleep(Native Method) ~[?:?]
|
at java.base/java.lang.Thread.sleep(Thread.java:344) ~[?:?] |
at java.base/java.util.concurrent.TimeUnit.sleep(TimeUnit.java:446) ~[?:?] |
at com.couchbase.analytics.bootstrap.ClusterMonitor.startMonitor(ClusterMonitor.java:93) ~[columnar-server.jar:1.0.0-2239] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) [?:?] |
2024-07-28T18:46:57.308+00:00 INFO CBAS.active.RecoveryTask [RecoveryTask (linkuTqJNHAv/default1)] Attempt to revive linkuTqJNHAv/default1 failed |
com.couchbase.analytics.common.exceptions.AnalyticsHyracksException: CBAS0079: Failed to connect link 'linkuTqJNHAv': HYR0091: Operation timed out |
at com.couchbase.analytics.util.BucketValidationUtils.ensureKVBucket(BucketValidationUtils.java:44) ~[columnar-connector.jar:1.0.0-2239] |
at com.couchbase.analytics.lang.ConnectLinkStatement.doConnect(ConnectLinkStatement.java:1122) ~[columnar-connector.jar:1.0.0-2239] |
at com.couchbase.analytics.metadata.BucketEventsListener.doConnect(BucketEventsListener.java:484) ~[columnar-connector.jar:1.0.0-2239] |
at com.couchbase.analytics.metadata.BucketEventsListener.compileAndStartJob(BucketEventsListener.java:470) ~[columnar-connector.jar:1.0.0-2239] |
at org.apache.asterix.app.active.ActiveEntityEventsListener.doStart(ActiveEntityEventsListener.java:403) ~[asterix-app.jar:1.0.0-2239] |
at org.apache.asterix.app.active.ActiveEntityEventsListener.doRecover(ActiveEntityEventsListener.java:430) ~[asterix-app.jar:1.0.0-2239] |
at org.apache.asterix.app.active.RecoveryTask.doRecover(RecoveryTask.java:142) ~[asterix-app.jar:1.0.0-2239] |
at org.apache.asterix.app.active.RecoveryTask.lambda$recover$1(RecoveryTask.java:70) ~[asterix-app.jar:1.0.0-2239] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) [?:?] |
2024-07-28T18:46:57.308+00:00 INFO CBAS.metadata.RecoveryRetryPolicy [RecoveryTask (linkuTqJNHAv/default1)] will retry recovery (attempt 352) in 60s |
2024-07-28T18:47:07.279+00:00 INFO CBAS.bootstrap.ClusterMonitor [linkuTqJNHAv cluster monitor] interrupted during pool stream; will retry |
java.lang.InterruptedException: sleep interrupted
|
at java.base/java.lang.Thread.sleep(Native Method) ~[?:?]
|
at java.base/java.lang.Thread.sleep(Thread.java:344) ~[?:?] |
at java.base/java.util.concurrent.TimeUnit.sleep(TimeUnit.java:446) ~[?:?] |
at com.couchbase.analytics.bootstrap.ClusterMonitor.startMonitor(ClusterMonitor.java:93) ~[columnar-server.jar:1.0.0-2239] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) [?:?] |
2024-07-28T18:47:22.279+00:00 INFO CBAS.bootstrap.ClusterMonitor [linkuTqJNHAv cluster monitor] interrupted during pool stream; will retry |
java.lang.InterruptedException: sleep interrupted
|
at java.base/java.lang.Thread.sleep(Native Method) ~[?:?]
|
at java.base/java.lang.Thread.sleep(Thread.java:344) ~[?:?] |
at java.base/java.util.concurrent.TimeUnit.sleep(TimeUnit.java:446) ~[?:?] |
at com.couchbase.analytics.bootstrap.ClusterMonitor.startMonitor(ClusterMonitor.java:93) ~[columnar-server.jar:1.0.0-2239] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) [?:?] |
2024-07-28T18:47:37.279+00:00 INFO CBAS.bootstrap.ClusterMonitor [linkuTqJNHAv cluster monitor] interrupted during pool stream; will retry |
java.lang.InterruptedException: sleep interrupted
|
at java.base/java.lang.Thread.sleep(Native Method) ~[?:?]
|
at java.base/java.lang.Thread.sleep(Thread.java:344) ~[?:?] |
at java.base/java.util.concurrent.TimeUnit.sleep(TimeUnit.java:446) ~[?:?] |
at com.couchbase.analytics.bootstrap.ClusterMonitor.startMonitor(ClusterMonitor.java:93) ~[columnar-server.jar:1.0.0-2239] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) [?:?] |
2024-07-28T18:47:52.279+00:00 INFO CBAS.bootstrap.ClusterMonitor [linkuTqJNHAv cluster monitor] interrupted during pool stream; will retry |
java.lang.InterruptedException: sleep interrupted
|
at java.base/java.lang.Thread.sleep(Native Method) ~[?:?]
|
at java.base/java.lang.Thread.sleep(Thread.java:344) ~[?:?] |
at java.base/java.util.concurrent.TimeUnit.sleep(TimeUnit.java:446) ~[?:?] |
at com.couchbase.analytics.bootstrap.ClusterMonitor.startMonitor(ClusterMonitor.java:93) ~[columnar-server.jar:1.0.0-2239] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) [?:?] |
I disconnected the remote link to see if the test can be resumed. Then, the service seemed to have crashed with status 2 errors
2024-07-29T02:51:38.931+00:00 INFO CBAS.messaging.NCMessageBroker [Executor-12:7a5b827b88d5f6077bc1a32e4c548fc1] Received message: TxnIdBlockResponse |
2024-07-29T02:51:38.940+00:00 ERRO CBAS.message.RegistrationTasksResponseMessage [Executor-8:7a5b827b88d5f6077bc1a32e4c548fc1] Failed during startup task |
org.apache.hyracks.api.exceptions.HyracksDataException: org.apache.asterix.common.exceptions.MetadataException: java.lang.IllegalStateException: attempt to create metadata index Database. Index should already exist
|
at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:49) ~[hyracks-api.jar:1.0.0-2239] |
at org.apache.asterix.app.nc.task.MetadataBootstrapTask.perform(MetadataBootstrapTask.java:55) ~[asterix-app.jar:1.0.0-2239] |
at org.apache.asterix.app.replication.message.RegistrationTasksResponseMessage.handle(RegistrationTasksResponseMessage.java:63) ~[asterix-app.jar:1.0.0-2239] |
at org.apache.asterix.messaging.NCMessageBroker.lambda$receivedMessage$0(NCMessageBroker.java:108) ~[asterix-app.jar:1.0.0-2239] |
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) [?:?] |
Caused by: org.apache.asterix.common.exceptions.MetadataException: java.lang.IllegalStateException: attempt to create metadata index Database. Index should already exist
|
at org.apache.asterix.metadata.bootstrap.MetadataBootstrap.startUniverse(MetadataBootstrap.java:193) ~[asterix-metadata.jar:1.0.0-2239] |
at org.apache.asterix.app.nc.NCAppRuntimeContext.initializeMetadata(NCAppRuntimeContext.java:539) ~[asterix-app.jar:1.0.0-2239] |
at org.apache.asterix.app.nc.task.MetadataBootstrapTask.perform(MetadataBootstrapTask.java:50) ~[asterix-app.jar:1.0.0-2239] |
... 7 more |
Caused by: java.lang.IllegalStateException: attempt to create metadata index Database. Index should already exist
|
at org.apache.asterix.metadata.bootstrap.MetadataBootstrap.ensureCatalogUpgradability(MetadataBootstrap.java:610) ~[asterix-metadata.jar:1.0.0-2239] |
at org.apache.asterix.metadata.bootstrap.MetadataBootstrap.enlistMetadataDataset(MetadataBootstrap.java:443) ~[asterix-metadata.jar:1.0.0-2239] |
at org.apache.asterix.metadata.bootstrap.MetadataBootstrap.startUniverse(MetadataBootstrap.java:155) ~[asterix-metadata.jar:1.0.0-2239] |
at org.apache.asterix.app.nc.NCAppRuntimeContext.initializeMetadata(NCAppRuntimeContext.java:539) ~[asterix-app.jar:1.0.0-2239] |
at org.apache.asterix.app.nc.task.MetadataBootstrapTask.perform(MetadataBootstrapTask.java:50) ~[asterix-app.jar:1.0.0-2239] |
... 7 more |
2024-07-29T02:51:38.948+00:00 INFO CBAS.util.ExitUtil [ShutdownWatchdog] starting shutdown watchdog- system will halt if shutdown is not completed within 600 seconds |
2024-07-29T02:51:38.948+00:00 WARN CBAS.util.ExitUtil [JVM exit thread] JVM exiting with status 2; bye! |
java.lang.Throwable: exit callstack
|
at org.apache.hyracks.util.ExitUtil.exit(ExitUtil.java:92) ~[hyracks-util.jar:1.0.0-2239] |
at org.apache.asterix.app.replication.message.RegistrationTasksResponseMessage.handle(RegistrationTasksResponseMessage.java:90) ~[asterix-app.jar:1.0.0-2239] |
at org.apache.asterix.messaging.NCMessageBroker.lambda$receivedMessage$0(NCMessageBroker.java:108) ~[asterix-app.jar:1.0.0-2239] |
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?] |
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?] |
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?] |
at java.base/java.lang.Thread.run(Thread.java:840) ~[?:?] |
2024-07-29T02:51:38.949+00:00 INFO CBAS.messaging.CCMessageBroker [Executor-10:ClusterController] Received message: |
cbcollect ->
https://cb-engineering.s3.amazonaws.com/SysTestColumnarRC3Half/collectinfo-2024-07-29T033732-ns_1%40svc-da-node-014.oinjtxrxhmvsl0g5.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/SysTestColumnarRC3Half/collectinfo-2024-07-29T033732-ns_1%40svc-da-node-020.oinjtxrxhmvsl0g5.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/SysTestColumnarRC3Half/collectinfo-2024-07-29T033732-ns_1%40svc-da-node-022.oinjtxrxhmvsl0g5.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/SysTestColumnarRC3Half/collectinfo-2024-07-29T033732-ns_1%40svc-da-node-028.oinjtxrxhmvsl0g5.sandbox.nonprod-project-avengers.com.zip
Attachments
Issue Links
- is caused by
-
AV-83099 Loading...