Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.2
-
Untriaged
-
1
-
Unknown
-
CX Sprint 250
Description
7.0.0-4955
Test:
-test tests/integration/cheshirecat/test_cheshirecat_kv_gsi_coll_xdcr_backup_sgw_fts_itemct_txns_eventing_cbas_scale3.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml
Scale 3
Iteration 3
.108.103 diag.log:
2021-04-21T15:23:09.425-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.104.137) - Bucket "default" rebalance appears to be swap rebalance
|
2021-04-21T15:23:11.139-07:00, ns_node_disco:4:info:node up(ns_1@172.23.104.67) - Node 'ns_1@172.23.104.67' saw that node 'ns_1@172.23.105.111' came up. Tags: [] (repeated 1 times, last seen 61.342812 secs ago)
|
2021-04-21T15:23:17.661-07:00, ns_node_disco:4:info:node up(ns_1@172.23.105.107) - Node 'ns_1@172.23.105.107' saw that node 'ns_1@172.23.105.111' came up. Tags: [] (repeated 1 times, last seen 68.368038 secs ago)
|
2021-04-21T15:23:18.407-07:00, ns_node_disco:4:info:node up(ns_1@172.23.104.157) - Node 'ns_1@172.23.104.157' saw that node 'ns_1@172.23.105.111' came up. Tags: [] (repeated 1 times, last seen 69.015704 secs ago)
|
2021-04-22T05:58:24.082-07:00, analytics:0:warning:message(ns_1@172.23.104.157) - Analytics Service unable to successfully rebalance 60133c5ac910f395614e1c0d59647e48 due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)'; see analytics_info.log for details
|
2021-04-22T05:58:24.588-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.137) - Rebalance exited with reason {service_rebalance_failed,cbas,
|
{worker_died,
|
{'EXIT',<0.28306.2352>,
|
{rebalance_failed,
|
{service_error,
|
<<"Rebalance 60133c5ac910f395614e1c0d59647e48 failed: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)">>}}}}}.
|
Rebalance Operation Id = 759b7fb5e9eb86629ddf589999b711a4
|
Console:
[2021-04-21T15:21:49-07:00, sequoiatools/couchbase-cli:7.0:1d0175] server-add -c 172.23.108.103:8091 --server-add https://172.23.105.111 -u Administrator -p password --server-add-username Administrator --server-add-password password --services index
|
[2021-04-21T15:22:23-07:00, sequoiatools/couchbase-cli:7.0:8b292a] rebalance -c 172.23.108.103:8091 -u Administrator -p password
|
→
|
|
Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
|
|
docker logs 8b292a
|
docker start 8b292a
|
|
*Unable to display progress bar on this os
|
JERROR: Rebalance failed. See logs for detailed reason. You can try again.
|
[2021-04-22T05:58:31-07:00, sequoiatools/cmd:a0f7ff] 60
|
.104.157 analytics info.log:
2021-04-22T05:58:22.850-07:00 INFO CBAS.server.QueryServiceServlet [HttpExecutor(port:8095)-10] handleRequest: <ud>{"host":"172.23.104.157:8095","path":"/query/service","statement":"select avg(price) as AvgPrice, min(price) as MinPrice, max(price) as MaxPrice from dv_8.ds_29 where free_breakfast=True and free_parking=True and price is not null and array_count(public_likes)>5 and `type`='Hotel' group by country limit 100;","pretty":true,"mode":"immediate","clientContextID":"query_thread_129617","format":"CLEAN_JSON","timeout":3600000,"maxResultReads":1,"planFormat":"JSON","expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":false,"job":false,"profile":"counts","signature":true,"multiStatement":true,"parseOnly":false,"readOnly":false,"maxWarnings":0,"scanConsistency":null,"scanWait":null}</ud>
|
2021-04-22T05:58:22.851-07:00 INFO CBAS.messaging.CCMessageBroker [Executor-758:ClusterController] Received message: ExecuteStatementRequestMessage(id=38079, from=47c864f2a9e597168db197820602939e): <ud>select avg(price) as AvgPrice, min(price) as MinPrice, max(price) as MaxPrice from dv_8.ds_29 where free_breakfast=True and free_parking=True and price is not null and array_count(public_likes)>5 and `type`='Hotel' group by country limit 100;</ud>
|
2021-04-22T05:58:22.908-07:00 INFO CBAS.work.JobCleanupWork [Worker:ClusterController] Cleanup for job: JID:5.80502
|
2021-04-22T05:58:22.910-07:00 INFO CBAS.messaging.NCMessageBroker [Worker:47c864f2a9e597168db197820602939e] Received message: ExecuteStatementResponseMessage(id=38079): 71 characters
|
2021-04-22T05:58:24.085-07:00 ERRO CBAS.rebalance.Rebalance [Executor-755:ClusterController] Rebalance 60133c5ac910f395614e1c0d59647e48 failed
|
java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)
|
at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:484) ~[cbas-server.jar:7.0.0-4955]
|
at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:613) ~[cbas-server.jar:7.0.0-4955]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:190) ~[cbas-server.jar:7.0.0-4955]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:152) [cbas-server.jar:7.0.0-4955]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:80) [cbas-server.jar:7.0.0-4955]
|
at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) [cbas-connector.jar:7.0.0-4955]
|
at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
|
at java.lang.Thread.run(Unknown Source) [?:?]
|
2021-04-22T05:58:24.086-07:00 WARN CBAS.rebalance.Rebalance [Executor-755:ClusterController] exit Rebalance 60133c5ac910f395614e1c0d59647e48
|
2021-04-22T05:58:24.086-07:00 INFO CBAS.rebalance.RebalanceProgress [Executor-754:ClusterController] dataset size fetcher interrupted
|
2021-04-22T05:58:24.576-07:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-10] Rebalance 60133c5ac910f395614e1c0d59647e48 failed
|
java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)
|
at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:484) ~[cbas-server.jar:7.0.0-4955]
|
at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:613) ~[cbas-server.jar:7.0.0-4955]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:190) ~[cbas-server.jar:7.0.0-4955]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:152) ~[cbas-server.jar:7.0.0-4955]
|
at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:80) ~[cbas-server.jar:7.0.0-4955]
|
at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) ~[cbas-connector.jar:7.0.0-4955]
|
at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
|
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
|
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
|
at java.lang.Thread.run(Unknown Source) [?:?]
|
2021-04-22T05:58:24.589-07:00 INFO CBAS.cbas requesting isBalanced for 60133c5ac910f395614e1c0d59647e48 from driver
|
Cluster config:
backup : 1 ===== > [172.23.123.28:8091] ###########
index : 6 ===== > [172.23.104.137:8091 172.23.105.107:8091 172.23.121.117:8091 172.23.96.252:8091 172.23.96.253:8091 172.23.99.11:8091]
fts : 2 ===== > [172.23.104.155:8091 172.23.96.148:8091] ###########
cbas : 4 ===== > [172.23.104.157:8091 172.23.104.5:8091 172.23.106.188:8091 172.23.97.242:8091] ###########
eventing : 4 ===== > [172.23.104.67:8091 172.23.123.27:8091 172.23.97.239:8091 172.23.98.135:8091] ###########
kv : 11 ===== > [172.23.104.69:8091 172.23.104.70:8091 172.23.106.100:8091 172.23.108.103:8091 172.23.121.3:8091 172.23.97.119:8091 172.23.97.121:8091 172.23.97.122:8091 172.23.99.20:8091 172.23.99.21:8091 172.23.99.25:8091] ###########
n1ql : 2 ===== > [172.23.120.245:8091 172.23.96.251:8091] ###########
Logs:
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.137.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.155.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.157.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.5.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.67.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.69.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.70.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.105.107.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.105.111.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.106.100.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.106.188.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.108.103.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.120.245.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.121.117.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.121.3.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.123.27.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.123.28.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.96.148.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.96.251.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.96.252.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.96.253.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.97.119.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.97.121.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.97.122.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.97.239.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.97.242.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.98.135.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.99.11.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.99.20.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.99.21.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.99.25.zip
Attachments
Issue Links
- is a backport of
-
MB-45869 [System Test][Analytics] Rebalance failed with error - java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [xxxx], state: ACTIVE)
- Closed
- is duplicated by
-
MB-45970 [Windows]cbbackupmgr: Backup failed with reason "failed to transfer Analytics metadata"
- Closed
-
MB-46285 [system test upgrade] : Analytics rebalance fails with "java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active" during upgrade from 6.6.2 -> 7.0.0
- Closed
-
MB-46781 [BP 6.6.3][system test upgrade] : Analytics rebalance fails with "java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active" during upgrade from 6.6.2 -> 7.0.0
- Closed
- relates to
-
MB-45954 Rebalance fails because of PrepareTopologyChange timeout
- Closed
-
MB-46865 [BP 6.6.3] Unhelpful error message on: timed out waiting for all nodes to join & cluster active (missing nodes: [e6f0383d4902ece226bc1f2329d23993]
- Closed
-
MB-46293 [CX] Unhelpful error message on: timed out waiting for all nodes to join & cluster active (missing nodes: [e6f0383d4902ece226bc1f2329d23993]
- Closed