Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62204

[System Test] Analytics service crash - JVM halting with status 7 with fatal errors

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • Columnar 1.0.0
    • Columnar 1.0.0
    • analytics
    • Columnar Edition 1.0.0 build 2126
    • Untriaged
    • 0
    • Unknown
    • Analytics Sprint 43, Analytics Sprint 44

    Description

      as seen on 002 -

      2024-06-06T06:42:53.274+00:00 FATA CBAS.metadata.MetadataManager [Executor-12:bc7826b85537b9e89e0be60443b84b47] Failure aborting a metadata transaction
      java.lang.NullPointerException: Cannot invoke "org.apache.asterix.metadata.MetadataTransactionContext.getTxnId()" because "ctx" is null
      	at org.apache.asterix.metadata.MetadataManager.abortTransaction(MetadataManager.java:167) ~[asterix-metadata.jar:1.0.0-2126]
      	at com.couchbase.analytics.util.LinkUtils.queryLinks(LinkUtils.java:919) ~[columnar-connector.jar:1.0.0-2126]
      	at com.couchbase.analytics.servlet.NodeDiagnosticsServlet.lambda$getResultJson$3(NodeDiagnosticsServlet.java:141) ~[columnar-server.jar:1.0.0-2126]
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
      	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
      2024-06-06T06:42:53.274+00:00 FATA CBAS.metadata.MetadataManager [HttpExecutor(port:9110)-10] Failure aborting a metadata transaction
      java.lang.NullPointerException: Cannot invoke "org.apache.asterix.metadata.MetadataTransactionContext.getTxnId()" because "ctx" is null
      	at org.apache.asterix.metadata.MetadataManager.abortTransaction(MetadataManager.java:167) [asterix-metadata.jar:1.0.0-2126]
      	at com.couchbase.analytics.servlet.MetadataDumpUtil.putMetadata(MetadataDumpUtil.java:84) [columnar-server.jar:1.0.0-2126]
      	at com.couchbase.analytics.servlet.MetadataDumpUtil.fetchMetadata(MetadataDumpUtil.java:65) [columnar-server.jar:1.0.0-2126]
      	at com.couchbase.analytics.servlet.NodeDiagnosticsServlet.putMetadata(NodeDiagnosticsServlet.java:220) [columnar-server.jar:1.0.0-2126]
      	at com.couchbase.analytics.servlet.NodeDiagnosticsServlet.getResultJson(NodeDiagnosticsServlet.java:147) [columnar-server.jar:1.0.0-2126]
      	at com.couchbase.analytics.servlet.NodeDiagnosticsServlet.get(NodeDiagnosticsServlet.java:106) [columnar-server.jar:1.0.0-2126]
      	at org.apache.hyracks.http.server.AbstractServlet.handle(AbstractServlet.java:90) [hyracks-http.jar:1.0.0-2126]
      	at com.couchbase.analytics.servlet.AuthenticatedServlet.handle(AuthenticatedServlet.java:96) [columnar-server.jar:1.0.0-2126]
      	at org.apache.hyracks.http.server.HttpRequestHandler.handle(HttpRequestHandler.java:83) [hyracks-http.jar:1.0.0-2126]
      	at org.apache.hyracks.http.server.HttpRequestHandler.call(HttpRequestHandler.java:68) [hyracks-http.jar:1.0.0-2126]
      	at org.apache.hyracks.http.server.HttpRequestHandler.call(HttpRequestHandler.java:37) [hyracks-http.jar:1.0.0-2126]
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
      	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
      2024-06-06T06:42:53.274+00:00 FATA CBAS.util.ExitUtil [Executor-12:bc7826b85537b9e89e0be60443b84b47] JVM halting with status 7 (halting thread Thread[Executor-12:bc7826b85537b9e89e0be60443b84b47,10,main], interrupted false)
      2024-06-06T06:42:53.415+00:00 FATA CBAS.util.ExitUtil [pool-2-thread-1] Thread dump at halt: 
      "main" [tid=1 state=WAITING lock=java.util.concurrent.Semaphore$NonfairSync@4b21c742]
      	at java.base@17.0.11/jdk.internal.misc.Unsafe.park(Native Method)
      	at java.base@17.0.11/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
      	at java.base@17.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
      	at java.base@17.0.11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1047)
      	at java.base@17.0.11/java.util.concurrent.Semaphore.acquire(Semaphore.java:318)
      	at app//com.couchbase.analytics.control.AnalyticsDriver.main(AnalyticsDriver.java:109)
      	at app//com.couchbase.columnar.ColumnarDriver.main(ColumnarDriver.java:10)
      
      

      cbcollect ->

      https://cb-engineering.s3.amazonaws.com/SysTestColumnarJun5/collectinfo-2024-06-06T064037-ns_1%40svc-da-node-001.ojc8vwsi23xsigjw.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnarJun5/collectinfo-2024-06-06T064037-ns_1%40svc-da-node-002.ojc8vwsi23xsigjw.sandbox.nonprod-project-avengers.com.zip

      There appear to be 3 different exceptions. Unsure if they're all related, but I discussed it with Ritik Raj and we concluded that it's better to have 3 separate tickets and if they all have the same root cause then they can be closed as duplicates.

      I've marked it as critical since it was a crash. Please mark it down if RCA deems it so.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pavan.pb Pavan PB
            pavan.pb Pavan PB
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty