Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
7.1.2
-
Enterprise Edition 7.1.2 build 3454
-
Untriaged
-
Centos 64-bit
-
-
1
-
Unknown
-
Analytics Sprint 8, Analytics Sprint 9
Description
QE TEST
-test tests/integration/neo/test_neo.yml -scope tests/integration/neo/scope_couchstore.yml
|
Day - 1
Cycle - 1
Scale - 3
TEST STEP
Analytics rebalance failed while performing swap rebalance of pair of analytics nodes.
[2022-11-07T19:17:09-08:00, sequoiatools/couchbase-cli:7.1:7bd2da] rebalance -c 172.23.108.103:8091 --server-remove 172.23.123.28 -u Administrator -p password |
→
|
|
Error occurred on container - sequoiatools/couchbase-cli:7.1:[rebalance -c 172.23.108.103:8091 --server-remove 172.23.123.28 -u Administrator -p password] |
|
docker logs 7bd2da
|
docker start 7bd2da
|
|
*Unable to display progress bar on this os |
JERROR: Rebalance failed. See logs for detailed reason. You can try again. |
CLUSTER SETTINGS
Cluster is running with ip family set to ipv4-only and n2n encryption enabled and encryption level set to all.
This issue was not observed while running the same test against same build with encryption level set to control.
REBALANCE FAILURE
2022-11-07T19:31:10.508-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason {service_rebalance_failed,cbas, |
{agent_died,<33046.11956.194>,noconnection}}. |
Rebalance Operation Id = 444e18f820d67472deab6113299537b5
|
On 172.23.106.188 analytics_error.log
2022-11-07T19:31:43.068-08:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-13] Rebalance d952a0882b80ca7efb6f40325e5a0747 failed |
java.util.concurrent.CancellationException: null |
at java.util.concurrent.FutureTask.report(FutureTask.java:121) ~[?:?] |
at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] |
at com.couchbase.analytics.control.rebalance.Rebalance.join(Rebalance.java:595) ~[cbas-server-7.1.2-3454.jar:7.1.2-3454] |
at com.couchbase.analytics.servlet.RebalanceServlet.processRebalanceStatusRequest(RebalanceServlet.java:124) [cbas-server-7.1.2-3454.jar:7.1.2-3454] |
at com.couchbase.analytics.servlet.RebalanceServlet.get(RebalanceServlet.java:87) [cbas-server-7.1.2-3454.jar:7.1.2-3454] |
at org.apache.hyracks.http.server.AbstractServlet.handle(AbstractServlet.java:90) [hyracks-http-7.1.2-3454.jar:7.1.2-3454] |
at com.couchbase.analytics.servlet.AuthenticatedServlet.handle(AuthenticatedServlet.java:93) [cbas-server-7.1.2-3454.jar:7.1.2-3454] |
at org.apache.hyracks.http.server.HttpRequestHandler.handle(HttpRequestHandler.java:83) [hyracks-http-7.1.2-3454.jar:7.1.2-3454] |
at org.apache.hyracks.http.server.HttpRequestHandler.call(HttpRequestHandler.java:68) [hyracks-http-7.1.2-3454.jar:7.1.2-3454] |
at org.apache.hyracks.http.server.HttpRequestHandler.call(HttpRequestHandler.java:37) [hyracks-http-7.1.2-3454.jar:7.1.2-3454] |
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] |
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] |
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] |
at java.lang.Thread.run(Thread.java:829) [?:?] |
On 172.23.99.11 analytics_debug.log
Following driver halt error is logged just before the rebalance failure.
2022-11-07T21:28:27.711-08:00 DEBU CBAS.util.ExitUtil [Stdin Watcher] JVM halting with status 33 (halting thread Thread[Stdin Watcher,5,main], interrupted false) |
2022-11-07T21:28:27.757-08:00 DEBU CBAS.util.ExitUtil [pool-2-thread-1] Thread dump at halt: |
"main" [tid=1 state=RUNNABLE] |
at java.base@11.0.16/java.util.zip.ZipFile$Source.getEntryPos(ZipFile.java:1649) |
at java.base@11.0.16/java.util.zip.ZipFile.getEntry(ZipFile.java:350) |
- <locked java.util.jar.JarFile@259ee69f> |
at java.base@11.0.16/java.util.zip.ZipFile$1.getEntry(ZipFile.java:1143) |
at java.base@11.0.16/java.util.jar.JarFile.getEntry0(JarFile.java:586) |
at java.base@11.0.16/java.util.jar.JarFile.getEntry(JarFile.java:516) |
at java.base@11.0.16/java.util.jar.JarFile.getJarEntry(JarFile.java:478) |
at java.base@11.0.16/jdk.internal.loader.URLClassPath$JarLoader.getResource(URLClassPath.java:943) |
at java.base@11.0.16/jdk.internal.loader.URLClassPath.getResource(URLClassPath.java:315) |
at java.base@11.0.16/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:695) |
at java.base@11.0.16/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621) |
- <locked java.lang.Object@493744f4> |
at java.base@11.0.16/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579) |
at java.base@11.0.16/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) |
at java.base@11.0.16/java.lang.ClassLoader.loadClass(ClassLoader.java:522) |
at app//org.apache.hyracks.control.common.ipc.CCNCFunctions$SerializerDeserializer.<init>(CCNCFunctions.java:1449) |
at app//org.apache.hyracks.control.nc.NodeControllerService.start(NodeControllerService.java:284) |
at app//com.couchbase.analytics.control.AnalyticsDriver.startService(AnalyticsDriver.java:132) |
at app//com.couchbase.analytics.control.AnalyticsDriver.startServices(AnalyticsDriver.java:113) |
at app//com.couchbase.analytics.control.AnalyticsDriver.main(AnalyticsDriver.java:90) |
|
"Reference Handler" [tid=2 state=RUNNABLE] |
at java.base@11.0.16/java.lang.ref.Reference.waitForReferencePendingList(Native Method) |
at java.base@11.0.16/java.lang.ref.Reference.processPendingReferences(Reference.java:241) |
at java.base@11.0.16/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:213) |
|
"Finalizer" [tid=3 state=WAITING lock=java.lang.ref.ReferenceQueue$Lock@2f72a08b] |
at java.base@11.0.16/java.lang.Object.wait(Native Method) |
at java.base@11.0.16/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:155) |
at java.base@11.0.16/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:176) |
at java.base@11.0.16/java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:170) |
On 172.23.99.11 analytics_info.log
Also observed following error just before rebalance failure which might be related to MB-54428.
NOTE
Following panic is also being observed multiple times on 99.11 node at a later point in time.
Do let me know if I should file I seperate ticket for this panic.
2022-11-07T21:32:34.014-08:00 ERRO CBAS.cbas cbas process aborting with exit code 113 due to panic: runtime error: invalid memory address or nil pointer dereference |
2022-11-07T21:32:34.015-08:00 INFO CBAS.cbas *** goroutine dump at panic: |
goroutine 1 [running]: |
main.RoutineDump()
|
goproj/src/github.com/couchbase/cbas/cbas/utils.go:112 +0xa7 |
main.main.func1()
|
goproj/src/github.com/couchbase/cbas/cbas/start.go:176 +0x71 |
panic({0x87b880, 0xc3eea0}) |
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/runtime/panic.go:838 +0x207 |
main.MetakvGet({0x8f84af, 0x1a}, {0x84b180, 0xc0000a80c0}) |
goproj/src/github.com/couchbase/cbas/cbas/metakv.go:142 +0x453 |
main.getCurrentTargetReplicas()
|
goproj/src/github.com/couchbase/cbas/cbas/config.go:594 +0x65 |
main.initRuntimeConfig(...)
|
goproj/src/github.com/couchbase/cbas/cbas/start.go:411 |
main.main2()
|
goproj/src/github.com/couchbase/cbas/cbas/start.go:282 +0xff0 |
main.main()
|
goproj/src/github.com/couchbase/cbas/cbas/start.go:180 +0x3b |
|
goroutine 18 [select]: |
github.com/couchbase/cbauth/cbauthimpl.(*tlsNotifier).loop(0xc0000b4108) |
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/cbauthimpl/impl.go:389 +0x67 |
created by github.com/couchbase/cbauth/cbauthimpl.NewSVCForTest
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/cbauthimpl/impl.go:550 +0x37a |
|
goroutine 19 [select]: |
github.com/couchbase/cbauth/cbauthimpl.(*cfgChangeNotifier).loop(0xc0000b4120) |
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/cbauthimpl/impl.go:309 +0x85 |
created by github.com/couchbase/cbauth/cbauthimpl.NewSVCForTest
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/cbauthimpl/impl.go:551 +0x3ca |
|
goroutine 20 [sleep]: |
time.Sleep(0x3b9aca00) |
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/runtime/time.go:194 +0x12e |
github.com/couchbase/cbauth/revrpc.(*DefaultErrorPolicy).try(0xc000078200, {0x9adf20?, 0xc0000baaf0?}) |
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/revrpc/revrpc.go:251 +0x1e5 |
github.com/couchbase/cbauth.runRPCForSvc.func1({0x9adf20, 0xc0000baaf0}) |
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/default.go:55 +0xb9 |
github.com/couchbase/cbauth/revrpc.BabysitService(0x0?, 0x0?, {0x9ae8a0?, 0xc00000e690?}) |
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/revrpc/revrpc.go:288 +0x62 |
github.com/couchbase/cbauth.runRPCForSvc(0x0?, 0xc0000b8270) |
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/default.go:57 +0xbd |
github.com/couchbase/cbauth.startDefault.func1()
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/default.go:66 +0x25 |
created by github.com/couchbase/cbauth.startDefault
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/cbauth/default.go:65 +0xf9 |
|
goroutine 21 [chan receive]: |
main.installThreadDumpHandler.func1()
|
goproj/src/github.com/couchbase/cbas/cbas/utils.go:144 +0x8a |
created by main.installThreadDumpHandler
|
goproj/src/github.com/couchbase/cbas/cbas/utils.go:140 +0x25 |
|
goroutine 34 [syscall]: |
os/signal.signal_recv()
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/runtime/sigqueue.go:151 +0x2f |
os/signal.loop()
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/os/signal/signal_unix.go:23 +0x19 |
created by os/signal.Notify.func1.1 |
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.18.5/go/src/os/signal/signal.go:151 +0x2a |
|
*** end; calling os.Exit()...
|