Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45869

[System Test][Analytics] Rebalance failed with error - java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [xxxx], state: ACTIVE)

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown
    • CX Sprint 246

    Description

      7.0.0-4955

      Test:
      -test tests/integration/cheshirecat/test_cheshirecat_kv_gsi_coll_xdcr_backup_sgw_fts_itemct_txns_eventing_cbas_scale3.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml
      Scale 3
      Iteration 3

      .108.103 diag.log:

      2021-04-21T15:23:09.425-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.104.137) - Bucket "default" rebalance appears to be swap rebalance
      2021-04-21T15:23:11.139-07:00, ns_node_disco:4:info:node up(ns_1@172.23.104.67) - Node 'ns_1@172.23.104.67' saw that node 'ns_1@172.23.105.111' came up. Tags: [] (repeated 1 times, last seen 61.342812 secs ago)
      2021-04-21T15:23:17.661-07:00, ns_node_disco:4:info:node up(ns_1@172.23.105.107) - Node 'ns_1@172.23.105.107' saw that node 'ns_1@172.23.105.111' came up. Tags: [] (repeated 1 times, last seen 68.368038 secs ago)
      2021-04-21T15:23:18.407-07:00, ns_node_disco:4:info:node up(ns_1@172.23.104.157) - Node 'ns_1@172.23.104.157' saw that node 'ns_1@172.23.105.111' came up. Tags: [] (repeated 1 times, last seen 69.015704 secs ago)
      2021-04-22T05:58:24.082-07:00, analytics:0:warning:message(ns_1@172.23.104.157) - Analytics Service unable to successfully rebalance 60133c5ac910f395614e1c0d59647e48 due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)'; see analytics_info.log for details
      2021-04-22T05:58:24.588-07:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.137) - Rebalance exited with reason {service_rebalance_failed,cbas,
                                    {worker_died,
                                     {'EXIT',<0.28306.2352>,
                                      {rebalance_failed,
                                       {service_error,
                                        <<"Rebalance 60133c5ac910f395614e1c0d59647e48 failed: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)">>}}}}}.
      Rebalance Operation Id = 759b7fb5e9eb86629ddf589999b711a4
      

      Console:

      [2021-04-21T15:21:49-07:00, sequoiatools/couchbase-cli:7.0:1d0175] server-add -c 172.23.108.103:8091 --server-add https://172.23.105.111 -u Administrator -p password --server-add-username Administrator --server-add-password password --services index
      [2021-04-21T15:22:23-07:00, sequoiatools/couchbase-cli:7.0:8b292a] rebalance -c 172.23.108.103:8091 -u Administrator -p password
      →  
       
      Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
       
      docker logs 8b292a
      docker start 8b292a
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.
      [2021-04-22T05:58:31-07:00, sequoiatools/cmd:a0f7ff] 60
      

      .104.157 analytics info.log:

      2021-04-22T05:58:22.850-07:00 INFO CBAS.server.QueryServiceServlet [HttpExecutor(port:8095)-10] handleRequest: <ud>{"host":"172.23.104.157:8095","path":"/query/service","statement":"select avg(price) as AvgPrice, min(price) as MinPrice, max(price) as MaxPrice from dv_8.ds_29 where free_breakfast=True and free_parking=True and price is not null and array_count(public_likes)>5 and `type`='Hotel' group by country limit 100;","pretty":true,"mode":"immediate","clientContextID":"query_thread_129617","format":"CLEAN_JSON","timeout":3600000,"maxResultReads":1,"planFormat":"JSON","expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":false,"job":false,"profile":"counts","signature":true,"multiStatement":true,"parseOnly":false,"readOnly":false,"maxWarnings":0,"scanConsistency":null,"scanWait":null}</ud>
      2021-04-22T05:58:22.851-07:00 INFO CBAS.messaging.CCMessageBroker [Executor-758:ClusterController] Received message: ExecuteStatementRequestMessage(id=38079, from=47c864f2a9e597168db197820602939e): <ud>select avg(price) as AvgPrice, min(price) as MinPrice, max(price) as MaxPrice from dv_8.ds_29 where free_breakfast=True and free_parking=True and price is not null and array_count(public_likes)>5 and `type`='Hotel' group by country limit 100;</ud>
      2021-04-22T05:58:22.908-07:00 INFO CBAS.work.JobCleanupWork [Worker:ClusterController] Cleanup for job: JID:5.80502
      2021-04-22T05:58:22.910-07:00 INFO CBAS.messaging.NCMessageBroker [Worker:47c864f2a9e597168db197820602939e] Received message: ExecuteStatementResponseMessage(id=38079): 71 characters
      2021-04-22T05:58:24.085-07:00 ERRO CBAS.rebalance.Rebalance [Executor-755:ClusterController] Rebalance 60133c5ac910f395614e1c0d59647e48 failed
      java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)
      	at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:484) ~[cbas-server.jar:7.0.0-4955]
      	at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:613) ~[cbas-server.jar:7.0.0-4955]
      	at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:190) ~[cbas-server.jar:7.0.0-4955]
      	at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:152) [cbas-server.jar:7.0.0-4955]
      	at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:80) [cbas-server.jar:7.0.0-4955]
      	at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) [cbas-connector.jar:7.0.0-4955]
      	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
      	at java.lang.Thread.run(Unknown Source) [?:?]
      2021-04-22T05:58:24.086-07:00 WARN CBAS.rebalance.Rebalance [Executor-755:ClusterController] exit Rebalance 60133c5ac910f395614e1c0d59647e48
      2021-04-22T05:58:24.086-07:00 INFO CBAS.rebalance.RebalanceProgress [Executor-754:ClusterController] dataset size fetcher interrupted
      2021-04-22T05:58:24.576-07:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-10] Rebalance 60133c5ac910f395614e1c0d59647e48 failed
      java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [1151f6cfc86035b5768e4a5ab36bbb8f], state: ACTIVE)
      	at com.couchbase.analytics.control.rebalance.Rebalance.ensureNodesClusterActive(Rebalance.java:484) ~[cbas-server.jar:7.0.0-4955]
      	at com.couchbase.analytics.control.rebalance.Rebalance.adjustClusterBeforeRebalance(Rebalance.java:613) ~[cbas-server.jar:7.0.0-4955]
      	at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:190) ~[cbas-server.jar:7.0.0-4955]
      	at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:152) ~[cbas-server.jar:7.0.0-4955]
      	at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:80) ~[cbas-server.jar:7.0.0-4955]
      	at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:27) ~[cbas-connector.jar:7.0.0-4955]
      	at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
      	at java.lang.Thread.run(Unknown Source) [?:?]
      2021-04-22T05:58:24.589-07:00 INFO CBAS.cbas requesting isBalanced for 60133c5ac910f395614e1c0d59647e48 from driver
      

      Cluster config:
      backup : 1 ===== > [172.23.123.28:8091] ###########
      index : 6 ===== > [172.23.104.137:8091 172.23.105.107:8091 172.23.121.117:8091 172.23.96.252:8091 172.23.96.253:8091 172.23.99.11:8091]
      fts : 2 ===== > [172.23.104.155:8091 172.23.96.148:8091] ###########
      cbas : 4 ===== > [172.23.104.157:8091 172.23.104.5:8091 172.23.106.188:8091 172.23.97.242:8091] ###########
      eventing : 4 ===== > [172.23.104.67:8091 172.23.123.27:8091 172.23.97.239:8091 172.23.98.135:8091] ###########
      kv : 11 ===== > [172.23.104.69:8091 172.23.104.70:8091 172.23.106.100:8091 172.23.108.103:8091 172.23.121.3:8091 172.23.97.119:8091 172.23.97.121:8091 172.23.97.122:8091 172.23.99.20:8091 172.23.99.21:8091 172.23.99.25:8091] ###########
      n1ql : 2 ===== > [172.23.120.245:8091 172.23.96.251:8091] ###########

      Logs:
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.137.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.155.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.157.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.5.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.67.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.69.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.104.70.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.105.107.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.105.111.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.106.100.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.106.188.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.108.103.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.120.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.121.117.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.121.3.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.123.27.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.123.28.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.96.148.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.96.251.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.96.252.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.96.253.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.97.119.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.97.121.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.97.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.97.239.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.97.242.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.98.135.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.99.11.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.99.20.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.99.21.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1619103127/collectinfo-2021-04-22T145210-ns_1%40172.23.99.25.zip

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-45869
          # Subject Branch Project Status CR V

          Activity

            People

              arunkumar Arunkumar Senthilnathan (Inactive)
              arunkumar Arunkumar Senthilnathan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty