Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-54453

[System Test][CBAS] Rebalance exited with reason {service_rebalance_failed,cbas, {agent_died,<33046.11956.194>,noconnection}}.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • 7.2.0
    • 7.1.2
    • analytics
    • Enterprise Edition 7.1.2 build 3454

    Description

      QE TEST

      -test tests/integration/neo/test_neo.yml -scope tests/integration/neo/scope_couchstore.yml
      

      Day - 1
      Cycle - 1
      Scale - 3

      TEST STEP
      Analytics rebalance failed while performing swap rebalance of pair of analytics nodes.

      [2022-11-07T19:17:09-08:00, sequoiatools/couchbase-cli:7.1:7bd2da] rebalance -c 172.23.108.103:8091 --server-remove 172.23.123.28 -u Administrator -p password
      →  
       
      Error occurred on container - sequoiatools/couchbase-cli:7.1:[rebalance -c 172.23.108.103:8091 --server-remove 172.23.123.28 -u Administrator -p password]
       
      docker logs 7bd2da
      docker start 7bd2da
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      

      CLUSTER SETTINGS
      Cluster is running with ip family set to ipv4-only and n2n encryption enabled and encryption level set to all.
      This issue was not observed while running the same test against same build with encryption level set to control.

      REBALANCE FAILURE

      2022-11-07T19:31:10.508-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason {service_rebalance_failed,cbas,
                                       {agent_died,<33046.11956.194>,noconnection}}.
      Rebalance Operation Id = 444e18f820d67472deab6113299537b5
      

      On 172.23.106.188 analytics_error.log

      2022-11-07T19:31:43.068-08:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-13] Rebalance d952a0882b80ca7efb6f40325e5a0747 failed
      java.util.concurrent.CancellationException: null
      	at java.util.concurrent.FutureTask.report(FutureTask.java:121) ~[?:?]
      	at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?]
      	at com.couchbase.analytics.control.rebalance.Rebalance.join(Rebalance.java:595) ~[cbas-server-7.1.2-3454.jar:7.1.2-3454]
      	at com.couchbase.analytics.servlet.RebalanceServlet.processRebalanceStatusRequest(RebalanceServlet.java:124) [cbas-server-7.1.2-3454.jar:7.1.2-3454]
      	at com.couchbase.analytics.servlet.RebalanceServlet.get(RebalanceServlet.java:87) [cbas-server-7.1.2-3454.jar:7.1.2-3454]
      	at org.apache.hyracks.http.server.AbstractServlet.handle(AbstractServlet.java:90) [hyracks-http-7.1.2-3454.jar:7.1.2-3454]
      	at com.couchbase.analytics.servlet.AuthenticatedServlet.handle(AuthenticatedServlet.java:93) [cbas-server-7.1.2-3454.jar:7.1.2-3454]
      	at org.apache.hyracks.http.server.HttpRequestHandler.handle(HttpRequestHandler.java:83) [hyracks-http-7.1.2-3454.jar:7.1.2-3454]
      	at org.apache.hyracks.http.server.HttpRequestHandler.call(HttpRequestHandler.java:68) [hyracks-http-7.1.2-3454.jar:7.1.2-3454]
      	at org.apache.hyracks.http.server.HttpRequestHandler.call(HttpRequestHandler.java:37) [hyracks-http-7.1.2-3454.jar:7.1.2-3454]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
      	at java.lang.Thread.run(Thread.java:829) [?:?]
      

      On 172.23.99.11 analytics_debug.log
      Following driver halt error is logged just before the rebalance failure.

      2022-11-07T21:28:27.711-08:00 DEBU CBAS.util.ExitUtil [Stdin Watcher] JVM halting with status 33 (halting thread Thread[Stdin Watcher,5,main], interrupted false)
      2022-11-07T21:28:27.757-08:00 DEBU CBAS.util.ExitUtil [pool-2-thread-1] Thread dump at halt: 
      "main" [tid=1 state=RUNNABLE]
      	at java.base@11.0.16/java.util.zip.ZipFile$Source.getEntryPos(ZipFile.java:1649)
      	at java.base@11.0.16/java.util.zip.ZipFile.getEntry(ZipFile.java:350)
      	- <locked java.util.jar.JarFile@259ee69f>
      	at java.base@11.0.16/java.util.zip.ZipFile$1.getEntry(ZipFile.java:1143)
      	at java.base@11.0.16/java.util.jar.JarFile.getEntry0(JarFile.java:586)
      	at java.base@11.0.16/java.util.jar.JarFile.getEntry(JarFile.java:516)
      	at java.base@11.0.16/java.util.jar.JarFile.getJarEntry(JarFile.java:478)
      	at java.base@11.0.16/jdk.internal.loader.URLClassPath$JarLoader.getResource(URLClassPath.java:943)
      	at java.base@11.0.16/jdk.internal.loader.URLClassPath.getResource(URLClassPath.java:315)
      	at java.base@11.0.16/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:695)
      	at java.base@11.0.16/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
      	- <locked java.lang.Object@493744f4>
      	at java.base@11.0.16/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
      	at java.base@11.0.16/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
      	at java.base@11.0.16/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
      	at app//org.apache.hyracks.control.common.ipc.CCNCFunctions$SerializerDeserializer.<init>(CCNCFunctions.java:1449)
      	at app//org.apache.hyracks.control.nc.NodeControllerService.start(NodeControllerService.java:284)
      	at app//com.couchbase.analytics.control.AnalyticsDriver.startService(AnalyticsDriver.java:132)
      	at app//com.couchbase.analytics.control.AnalyticsDriver.startServices(AnalyticsDriver.java:113)
      	at app//com.couchbase.analytics.control.AnalyticsDriver.main(AnalyticsDriver.java:90)
       
      "Reference Handler" [tid=2 state=RUNNABLE]
      	at java.base@11.0.16/java.lang.ref.Reference.waitForReferencePendingList(Native Method)
      	at java.base@11.0.16/java.lang.ref.Reference.processPendingReferences(Reference.java:241)
      	at java.base@11.0.16/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213)
       
      "Finalizer" [tid=3 state=WAITING lock=java.lang.ref.ReferenceQueue$Lock@2f72a08b]
      	at java.base@11.0.16/java.lang.Object.wait(Native Method)
      	at java.base@11.0.16/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155)
      	at java.base@11.0.16/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:176)
      	at java.base@11.0.16/java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:170)
      

      On 172.23.99.11 analytics_info.log
      Also observed following error just before rebalance failure which might be related to MB-54428.

      NOTE
      Following panic is also being observed multiple times on 99.11 node at a later point in time.
      Do let me know if I should file I seperate ticket for this panic.

      2022-11-07T21:32:34.014-08:00 ERRO CBAS.cbas cbas process aborting with exit code 113 due to panic: runtime error: invalid memory address or nil pointer dereference
      2022-11-07T21:32:34.015-08:00 INFO CBAS.cbas *** goroutine dump at panic:
      goroutine 1 [running]:
      main.RoutineDump()
      	goproj/src/github.com/couchbase/cbas/cbas/utils.go:112 +0xa7
      main.main.func1()
      	goproj/src/github.com/couchbase/cbas/cbas/start.go:176 +0x71
      panic({0x87b880, 0xc3eea0})
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/runtime/panic.go:838 +0x207
      main.MetakvGet({0x8f84af, 0x1a}, {0x84b180, 0xc0000a80c0})
      	goproj/src/github.com/couchbase/cbas/cbas/metakv.go:142 +0x453
      main.getCurrentTargetReplicas()
      	goproj/src/github.com/couchbase/cbas/cbas/config.go:594 +0x65
      main.initRuntimeConfig(...)
      	goproj/src/github.com/couchbase/cbas/cbas/start.go:411
      main.main2()
      	goproj/src/github.com/couchbase/cbas/cbas/start.go:282 +0xff0
      main.main()
      	goproj/src/github.com/couchbase/cbas/cbas/start.go:180 +0x3b
       
      goroutine 18 [select]:
      github.com/couchbase/cbauth/cbauthimpl.(*tlsNotifier).loop(0xc0000b4108)
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/cbauthimpl/impl.go:389 +0x67
      created by github.com/couchbase/cbauth/cbauthimpl.NewSVCForTest
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/cbauthimpl/impl.go:550 +0x37a
       
      goroutine 19 [select]:
      github.com/couchbase/cbauth/cbauthimpl.(*cfgChangeNotifier).loop(0xc0000b4120)
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/cbauthimpl/impl.go:309 +0x85
      created by github.com/couchbase/cbauth/cbauthimpl.NewSVCForTest
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/cbauthimpl/impl.go:551 +0x3ca
       
      goroutine 20 [sleep]:
      time.Sleep(0x3b9aca00)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/runtime/time.go:194 +0x12e
      github.com/couchbase/cbauth/revrpc.(*DefaultErrorPolicy).try(0xc000078200, {0x9adf20?, 0xc0000baaf0?})
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/revrpc/revrpc.go:251 +0x1e5
      github.com/couchbase/cbauth.runRPCForSvc.func1({0x9adf20, 0xc0000baaf0})
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/default.go:55 +0xb9
      github.com/couchbase/cbauth/revrpc.BabysitService(0x0?, 0x0?, {0x9ae8a0?, 0xc00000e690?})
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/revrpc/revrpc.go:288 +0x62
      github.com/couchbase/cbauth.runRPCForSvc(0x0?, 0xc0000b8270)
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/default.go:57 +0xbd
      github.com/couchbase/cbauth.startDefault.func1()
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/default.go:66 +0x25
      created by github.com/couchbase/cbauth.startDefault
      	/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/default.go:65 +0xf9
       
      goroutine 21 [chan receive]:
      main.installThreadDumpHandler.func1()
      	goproj/src/github.com/couchbase/cbas/cbas/utils.go:144 +0x8a
      created by main.installThreadDumpHandler
      	goproj/src/github.com/couchbase/cbas/cbas/utils.go:140 +0x25
       
      goroutine 34 [syscall]:
      os/signal.signal_recv()
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/runtime/sigqueue.go:151 +0x2f
      os/signal.loop()
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/os/signal/signal_unix.go:23 +0x19
      created by os/signal.Notify.func1.1
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/os/signal/signal.go:151 +0x2a
       
      *** end; calling os.Exit()...
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            murtadha.hubail Murtadha Hubail
            sujay.gad Sujay Gad
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty