Loading...

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: analytics
Labels:
- system-test
- triaged

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Unknown
Sprint:
CX Sprint 246

Description

7.0.0-4955

Test:
-test tests/integration/cheshirecat/test_cheshirecat_kv_gsi_coll_xdcr_backup_sgw_fts_itemct_txns_eventing_cbas_scale3.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml
Scale 3
Iteration 3

.108.103 diag.log:

2021-04-21T15:23:09.425-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.104.137) - Bucket "default" rebalance appears to be swap rebalance

2021-04-21T15:23:11.139-07:00, ns_node_disco:4:info:node up(ns_1@172.23.104.67) - Node 'ns_1@172.23.104.67' saw that node 'ns_1@172.23.105.111' came up. Tags: [] (repeated 1 times, last seen 61.342812 secs ago)

2021-04-21T15:23:17.661-07:00, ns_node_disco:4:info:node up(ns_1@172.23.105.107) - Node 'ns_1@172.23.105.107' saw that node 'ns_1@172.23.105.111' came up. Tags: [] (repeated 1 times, last seen 68.368038 secs ago)

2021-04-21T15:23:18.407-07:00, ns_node_disco:4:info:node up(ns_1@172.23.104.157) - Node 'ns_1@172.23.104.157' saw that node 'ns_1@172.23.105.111' came up. Tags: [] (repeated 1 times, last seen 69.015704 secs ago)

2021-04-22T05:58:24.082-07:00, analytics:0:warning:message(ns_1@172.23.104.157) - Analytics Service unable to successfully rebalance 60133c5ac910f395614e1c0d59647e48 due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)'; see analytics_info.log for details

2021-04-22T05:58:24.588-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.137) - Rebalance exited with reason {service_rebalance_failed,cbas,

                              {worker_died,

                               {'EXIT',<0.28306.2352>,

                                {rebalance_failed,

                                 {service_error,

                                  <<"Rebalance 60133c5ac910f395614e1c0d59647e48 failed: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)">>}}}}}.

Rebalance Operation Id = 759b7fb5e9eb86629ddf589999b711a4

Console:

[2021-04-21T15:21:49-07:00, sequoiatools/couchbase-cli:7.0:1d0175] server-add -c 172.23.108.103:8091 --server-add https://172.23.105.111 -u Administrator -p password --server-add-username Administrator --server-add-password password --services index

[2021-04-21T15:22:23-07:00, sequoiatools/couchbase-cli:7.0:8b292a] rebalance -c 172.23.108.103:8091 -u Administrator -p password

→

Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]

docker logs 8b292a

docker start 8b292a

*Unable to display progress bar on this os

JERROR: Rebalance failed. See logs for detailed reason. You can try again.

[2021-04-22T05:58:31-07:00, sequoiatools/cmd:a0f7ff] 60

.104.157 analytics info.log:

2021-04-22T05:58:22.850-07:00 INFO CBAS.server.QueryServiceServlet [HttpExecutor(port:8095)-10] handleRequest: <ud>{"host":"172.23.104.157:8095","path":"/query/service","statement":"select avg(price) as AvgPrice, min(price) as MinPrice, max(price) as MaxPrice from dv_8.ds_29 where free_breakfast=True and free_parking=True and price is not null and array_count(public_likes)>5 and `type`='Hotel' group by country limit 100;","pretty":true,"mode":"immediate","clientContextID":"query_thread_129617","format":"CLEAN_JSON","timeout":3600000,"maxResultReads":1,"planFormat":"JSON","expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":false,"job":false,"profile":"counts","signature":true,"multiStatement":true,"parseOnly":false,"readOnly":false,"maxWarnings":0,"scanConsistency":null,"scanWait":null}</ud>

2021-04-22T05:58:22.851-07:00 INFO CBAS.messaging.CCMessageBroker [Executor-758:ClusterController] Received message: ExecuteStatementRequestMessage(id=38079, from=47c864f2a9e597168db197820602939e): <ud>select avg(price) as AvgPrice, min(price) as MinPrice, max(price) as MaxPrice from dv_8.ds_29 where free_breakfast=True and free_parking=True and price is not null and array_count(public_likes)>5 and `type`='Hotel' group by country limit 100;</ud>

2021-04-22T05:58:22.908-07:00 INFO CBAS.work.JobCleanupWork [Worker:ClusterController] Cleanup for job: JID:5.80502

2021-04-22T05:58:22.910-07:00 INFO CBAS.messaging.NCMessageBroker [Worker:47c864f2a9e597168db197820602939e] Received message: ExecuteStatementResponseMessage(id=38079): 71 characters

2021-04-22T05:58:24.085-07:00 ERRO CBAS.rebalance.Rebalance [Executor-755:ClusterController] Rebalance 60133c5ac910f395614e1c0d59647e48 failed

java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)

	at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:484) ~[cbas-server.jar:7.0.0-4955]

	at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:613) ~[cbas-server.jar:7.0.0-4955]

	at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:190) ~[cbas-server.jar:7.0.0-4955]

	at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:152) [cbas-server.jar:7.0.0-4955]

	at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:80) [cbas-server.jar:7.0.0-4955]

	at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) [cbas-connector.jar:7.0.0-4955]

	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]

	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]

	at java.lang.Thread.run(Unknown Source) [?:?]

2021-04-22T05:58:24.086-07:00 WARN CBAS.rebalance.Rebalance [Executor-755:ClusterController] exit Rebalance 60133c5ac910f395614e1c0d59647e48

2021-04-22T05:58:24.086-07:00 INFO CBAS.rebalance.RebalanceProgress [Executor-754:ClusterController] dataset size fetcher interrupted

2021-04-22T05:58:24.576-07:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-10] Rebalance 60133c5ac910f395614e1c0d59647e48 failed

java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)

	at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:484) ~[cbas-server.jar:7.0.0-4955]

	at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:613) ~[cbas-server.jar:7.0.0-4955]

	at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:190) ~[cbas-server.jar:7.0.0-4955]

	at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:152) ~[cbas-server.jar:7.0.0-4955]

	at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:80) ~[cbas-server.jar:7.0.0-4955]

	at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) ~[cbas-connector.jar:7.0.0-4955]

	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]

	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]

	at java.lang.Thread.run(Unknown Source) [?:?]

2021-04-22T05:58:24.589-07:00 INFO CBAS.cbas requesting isBalanced for 60133c5ac910f395614e1c0d59647e48 from driver

Cluster config:
backup : 1 ===== > [172.23.123.28:8091] ###########
index : 6 ===== > [172.23.104.137:8091 172.23.105.107:8091 172.23.121.117:8091 172.23.96.252:8091 172.23.96.253:8091 172.23.99.11:8091]
fts : 2 ===== > [172.23.104.155:8091 172.23.96.148:8091] ###########
cbas : 4 ===== > [172.23.104.157:8091 172.23.104.5:8091 172.23.106.188:8091 172.23.97.242:8091] ###########
eventing : 4 ===== > [172.23.104.67:8091 172.23.123.27:8091 172.23.97.239:8091 172.23.98.135:8091] ###########
kv : 11 ===== > [172.23.104.69:8091 172.23.104.70:8091 172.23.106.100:8091 172.23.108.103:8091 172.23.121.3:8091 172.23.97.119:8091 172.23.97.121:8091 172.23.97.122:8091 172.23.99.20:8091 172.23.99.21:8091 172.23.99.25:8091] ###########
n1ql : 2 ===== > [172.23.120.245:8091 172.23.96.251:8091] ###########

Attachments

Issue Links

backports to

MB-46782 [BP 6.6.3][System Test][Analytics] Rebalance failed with error - java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [xxxx], state: ACTIVE)

Closed

is duplicated by

MB-45970 [Windows]cbbackupmgr: Backup failed with reason "failed to transfer Analytics metadata"

Closed

MB-46285 [system test upgrade] : Analytics rebalance fails with "java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active" during upgrade from 6.6.2 -> 7.0.0

Closed

MB-46781 [BP 6.6.3][system test upgrade] : Analytics rebalance fails with "java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active" during upgrade from 6.6.2 -> 7.0.0

Closed

relates to

MB-45954 Rebalance fails because of PrepareTopologyChange timeout

Closed

MB-46865 [BP 6.6.3] Unhelpful error message on: timed out waiting for all nodes to join & cluster active (missing nodes: [e6f0383d4902ece226bc1f2329d23993]

Closed

MB-46293 [CX] Unhelpful error message on: timed out waiting for all nodes to join & cluster active (missing nodes: [e6f0383d4902ece226bc1f2329d23993]

Closed

(2 relates to)

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

[System Test][Analytics] Rebalance failed with error - java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [xxxx], state: ACTIVE)

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty