Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.4, 7.2.0
-
7.1.4-3601 -> 7.2.0-5324
-
Untriaged
-
Centos 64-bit
-
0
-
Unknown
-
Analytics Sprint 20
Description
Steps to Repro
1. Run a longevity test on 7.1.4 for 2 days.
./sequoia -client 172.23.104.27:2375 -provider file:centos_pine.yml -test tests/integration/neo/test_neo.yml -scope tests/integration/neo/scope_neo_magma.yml -scale 3 -repeat 0 -log_level 0 -version 7.1.4-3601 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
2. Upgraded to 7.2.0-5324 using online upgrade with failover/recovery strategy.
3. Enabled CDC on all buckets and on some collections post upgrade.
4. Hard failed over nodes(one of each service type), did full recovery and rebalanced. Rebalance succeeds with failures in analytics side of the rebalance. Tried it couple of times. Same state.
172.23.120.75 9:14:07 PM 15 May, 2023
Starting rebalance, KeepNodes = ['ns_1@172.23.120.58','ns_1@172.23.120.73',
|
'ns_1@172.23.120.74','ns_1@172.23.120.75',
|
'ns_1@172.23.120.77','ns_1@172.23.120.81',
|
'ns_1@172.23.120.86','ns_1@172.23.121.77',
|
'ns_1@172.23.123.25','ns_1@172.23.123.26',
|
'ns_1@172.23.123.31','ns_1@172.23.123.32',
|
'ns_1@172.23.123.33','ns_1@172.23.96.122',
|
'ns_1@172.23.96.243','ns_1@172.23.96.254',
|
'ns_1@172.23.96.48','ns_1@172.23.97.105',
|
'ns_1@172.23.97.110','ns_1@172.23.97.112',
|
'ns_1@172.23.97.148','ns_1@172.23.97.241',
|
'ns_1@172.23.97.74'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = ff9177e6beb670e1b5e19414fccf4d3
|
172.23.120.86 10:57:16 PM 15 May, 2023
Analytics Service unable to successfully rebalance 9d3bc1e5cf6f0a1cca0b06e36ea29f36 due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [453dcde5201e809268a0df89fe474ebe], state: UNUSABLE)'; see analytics_info.log for details
|
172.23.120.75 10:41:01 AM 16 May, 2023
Starting rebalance, KeepNodes = ['ns_1@172.23.120.58','ns_1@172.23.120.73',
|
'ns_1@172.23.120.74','ns_1@172.23.120.75',
|
'ns_1@172.23.120.77','ns_1@172.23.120.81',
|
'ns_1@172.23.120.86','ns_1@172.23.121.77',
|
'ns_1@172.23.123.25','ns_1@172.23.123.26',
|
'ns_1@172.23.123.31','ns_1@172.23.123.32',
|
'ns_1@172.23.123.33','ns_1@172.23.96.122',
|
'ns_1@172.23.96.243','ns_1@172.23.96.254',
|
'ns_1@172.23.96.48','ns_1@172.23.97.105',
|
'ns_1@172.23.97.110','ns_1@172.23.97.112',
|
'ns_1@172.23.97.148','ns_1@172.23.97.241',
|
'ns_1@172.23.97.74'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 4cc40c996dae66055c2073dafe54ce53
|
172.23.120.86 11:22:14 AM 16 May, 2023
Analytics Service unable to successfully rebalance 4c95651b68cfdfc22d0c1d306ff4d7c1 due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [453dcde5201e809268a0df89fe474ebe], state: UNUSABLE)'; see analytics_info.log for details
|
analytics_info.log(172.23.120.86)
2023-05-15T22:57:06.246-07:00 INFO CBAS.work.NotifyShutdownWork [Worker:ClusterController] Received unsolicted shutdown notification from node 453dcde5201e809268a0df89fe474ebe
|
2023-05-15T22:57:16.021-07:00 ERRO CBAS.rebalance.Rebalance [Executor-187:ClusterController] Rebalance 9d3bc1e5cf6f0a1cca0b06e36ea29f36 failed
|
java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [453dcde5201e809268a0df89fe474ebe], state: UNUSABLE)
|
at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:535) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:692) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:205) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:166) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:84) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) ~[cbas-connector-7.2.0-5324.jar:7.2.0-5324]
|
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
|
at java.lang.Thread.run(Thread.java:829) ~[?:?]
|
2023-05-15T22:57:16.021-07:00 WARN CBAS.rebalance.Rebalance [Executor-187:ClusterController] exit Rebalance 9d3bc1e5cf6f0a1cca0b06e36ea29f36
|
2023-05-15T22:57:16.021-07:00 INFO CBAS.rebalance.RebalanceProgress [Executor-188:ClusterController] dataset size fetcher interrupted
|
2023-05-15T22:57:16.349-07:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-5] Rebalance 9d3bc1e5cf6f0a1cca0b06e36ea29f36 failed
|
java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [453dcde5201e809268a0df89fe474ebe], state: UNUSABLE)
|
at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:535) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:692) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:205) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:166) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:84) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) ~[cbas-connector-7.2.0-5324.jar:7.2.0-5324]
|
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
|
at java.lang.Thread.run(Thread.java:829) ~[?:?]
|
2023-05-15T22:57:16.433-07:00 INFO CBAS.cbas requesting isBalanced for 9d3bc1e5cf6f0a1cca0b06e36ea29f36 from driver
|
....
|
....
|
....
|
2023-05-16T11:22:14.865-07:00 ERRO CBAS.rebalance.Rebalance [Executor-201:ClusterController] Rebalance 4c95651b68cfdfc22d0c1d306ff4d7c1 failed
|
java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [453dcde5201e809268a0df89fe474ebe], state: UNUSABLE)
|
at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:535) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:692) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:205) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:166) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:84) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) ~[cbas-connector-7.2.0-5324.jar:7.2.0-5324]
|
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
|
at java.lang.Thread.run(Thread.java:829) ~[?:?]
|
2023-05-16T11:22:14.865-07:00 WARN CBAS.rebalance.Rebalance [Executor-201:ClusterController] exit Rebalance 4c95651b68cfdfc22d0c1d306ff4d7c1
|
2023-05-16T11:22:14.865-07:00 INFO CBAS.rebalance.RebalanceProgress [Executor-202:ClusterController] dataset size fetcher interrupted
|
2023-05-16T11:22:15.244-07:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-3] Rebalance 4c95651b68cfdfc22d0c1d306ff4d7c1 failed
|
java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [453dcde5201e809268a0df89fe474ebe], state: UNUSABLE)
|
at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:535) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:692) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:205) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:166) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:84) ~[cbas-server-7.2.0-5324.jar:7.2.0-5324]
|
at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) ~[cbas-connector-7.2.0-5324.jar:7.2.0-5324]
|
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
|
cbcollect_info attached.
Attachments
Issue Links
- depends on
-
MB-56889 [System test upgrade] :- Anlaytics Rebalance fails with Rebalance 7fef0ad83705736f70d24831ccdf0c6e failed: Index with resource ID 6245 already exists.
- Closed