Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46061

Rebalance done for cluster creation fails with prepare_rebalance_failed

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      +Script to Repro +

      ./testrunner -i /tmp/testexec.16334.ini -p verify_unacked_bytes=True,get-cbcollect-info=False,get-cbcollect-info=True -t failover.failovertests.FailoverTests.test_failover_normal,replicas=1,graceful=False,num_failed_nodes=1,load_ratio=10,GROUP=P1,verify_unacked_bytes=True,get-cbcollect-info=True
      

      Steps to Repro
      1. Add 6 nodes to a node that already (172.23.123.132:8091 - kv,index,n1ql) is configured and do a rebalance.

      2021-05-03 00:28:43 | INFO | MainProcess | test_thread | [basetestcase.add_built_in_server_user] **** add 'admin' role to 'cbadminbucket' user ****
      2021-05-03 00:28:43 | INFO | MainProcess | test_thread | [basetestcase.setUp] done initializing cluster
      2021-05-03 00:28:43 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 172.23.123.128:8091 to cluster
      2021-05-03 00:28:43 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @172.23.123.128:8091 to this cluster @172.23.123.132:8091
      2021-05-03 00:28:53 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] rebalance progress took 10.01 seconds 
      2021-05-03 00:28:53 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] sleep for 10 seconds after rebalance...
      2021-05-03 00:29:07 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 172.23.123.134:8091 to cluster
      2021-05-03 00:29:07 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @172.23.123.134:8091 to this cluster @172.23.123.132:8091
      2021-05-03 00:29:17 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] rebalance progress took 10.05 seconds 
      2021-05-03 00:29:17 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] sleep for 10 seconds after rebalance...
      2021-05-03 00:29:35 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 172.23.105.50:8091 to cluster
      2021-05-03 00:29:35 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @172.23.105.50:8091 to this cluster @172.23.123.132:8091
      2021-05-03 00:29:45 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] rebalance progress took 10.02 seconds 
      2021-05-03 00:29:45 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] sleep for 10 seconds after rebalance...
      2021-05-03 00:29:58 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 172.23.105.53:8091 to cluster
      2021-05-03 00:29:58 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @172.23.105.53:8091 to this cluster @172.23.123.132:8091
      2021-05-03 00:30:08 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] rebalance progress took 10.02 seconds 
      2021-05-03 00:30:08 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] sleep for 10 seconds after rebalance...
      2021-05-03 00:30:21 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 172.23.105.32:8091 to cluster
      2021-05-03 00:30:21 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @172.23.105.32:8091 to this cluster @172.23.123.132:8091
      2021-05-03 00:30:31 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] rebalance progress took 10.05 seconds 
      2021-05-03 00:30:31 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] sleep for 10 seconds after rebalance...
      2021-05-03 00:30:49 | INFO | MainProcess | Cluster_Thread | [task.add_nodes] adding node 172.23.105.59:8091 to cluster
      2021-05-03 00:30:49 | INFO | MainProcess | Cluster_Thread | [rest_client.add_node] adding remote node @172.23.105.59:8091 to this cluster @172.23.123.132:8091
      2021-05-03 00:30:59 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] rebalance progress took 10.03 seconds 
      2021-05-03 00:30:59 | INFO | MainProcess | Cluster_Thread | [rest_client.monitorRebalance] sleep for 10 seconds after rebalance...
      

      Rebalance fails as shown below.

      2021-05-03 00:31:16 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance params : {'knownNodes': 'ns_1@172.23.105.32,ns_1@172.23.105.50,ns_1@172.23.105.53,ns_1@172.23.105.59,ns_1@172.23.123.128,ns_1@172.23.123.132,ns_1@172.23.123.134', 'ejectedNodes': '', 'user': 'Administrator', 'password': 'password'}
      2021-05-03 00:31:16 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance operation started
      2021-05-03 00:31:27 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 0.00 %
      2021-05-03 00:31:27 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 0.00%
      2021-05-03 00:31:47 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 0.00 %
      2021-05-03 00:31:47 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 0.00%
      2021-05-03 00:32:07 | ERROR | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] {'status': 'none', 'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed
      2021-05-03 00:32:07 | INFO | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] Latest logs from UI on 172.23.123.132:
      2021-05-03 00:32:07 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {'node': 'ns_1@172.23.123.132', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1620027106956, 'shortText': 'message', 'text': "Rebalance exited with reason {prepare_rebalance_failed,\n                              {error,\n                               {failed_nodes,\n                                [{'ns_1@172.23.105.32',{error,timeout}}]}}}.\nRebalance Operation Id = da2bb8cf960323fa9e2131c14400fc26", 'serverTime': '2021-05-03T00:31:46.956Z'}
      

      cbcollect_info attached. This test last passed on 7.0.0-5017.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty