Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51535

Rebalance out of GSI node failed for retry of failed rebalance

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Rebalance out of failed rebalance-out of GSI node failed with below error. Basically we fail the rebalance out of index node by shutting the node and then restarting it. After restart we retry the rebalance out of same index node. It was supposed to pass but didn't and failed with below error. This test was passing in build 7.1.0-2475

      Steps to reproduce:

      1. Create a cluster with 3 nodes with following services kv:n1ql:index-index-index
      2. Load some data and create few indexes with one replica

        [2022-03-21 06:31:24,236] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_0` ON default:test_bucket.test_scope_1.test_collection_1(age) USING GSI  WITH {'defer_build': False, 'num_replica': 1}
        [2022-03-21 06:31:24,237] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_5` ON default:test_bucket.test_scope_1.test_collection_1(lastName) USING GSI  WITH {'defer_build': False, 'num_replica': 1}
        [2022-03-21 06:31:24,238] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_1` ON default:test_bucket.test_scope_1.test_collection_1(city) USING GSI  WITH {'defer_build': False, 'num_replica': 1}
        [2022-03-21 06:31:24,240] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_3` ON default:test_bucket.test_scope_1.test_collection_1(title) USING GSI  WITH {'defer_build': False, 'num_replica': 1}
        [2022-03-21 06:31:24,241] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_6` ON default:test_bucket.test_scope_1.test_collection_1(streetAddress) USING GSI  WITH {'defer_build': False, 'num_replica': 1}
        [2022-03-21 06:31:24,242] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_8` ON default:test_bucket.test_scope_1.test_collection_1(filler1) USING GSI  WITH {'defer_build': False, 'num_replica': 1}
        [2022-03-21 06:31:24,243] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_2` ON default:test_bucket.test_scope_1.test_collection_1(country) USING GSI  WITH {'defer_build': False, 'num_replica': 1} 
        [2022-03-21 06:31:24,244] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_7` ON default:test_bucket.test_scope_1.test_collection_1(suffix) USING GSI  WITH {'defer_build': False, 'num_replica': 1}
        [2022-03-21 06:31:24,245] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_4` ON default:test_bucket.test_scope_1.test_collection_1(firstName) USING GSI  WITH {'defer_build': False, 'num_replica': 1}
        [2022-03-21 06:31:24,246] - [rest_client:4156] INFO - query params : 

      3. Start rebalance out of one of the Index-only node (172.23.107.235) and before the rebalance finishes, stop the server on that node to fail the rebalance process. 
      4. After rebalance fails (may take 5sec), start the server again.
      5. Check if all the docs are indexed by running query and validating the count
      6. Retry the rebalance out of the same index node(172.23.107.235). Rebalance failed with below error.

        [2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1647869578530, 'shortText': 'message', 'text': 'Rebalance exited with reason {service_rebalance_failed,index,\n                              {worker_died,\n                               {\'EXIT\',<0.2546.3>,\n                                {{badmatch,\n                                  {error,\n                                   {bad_nodes,index,prepare_rebalance,\n                                    [{\'ns_1@172.23.107.235\',\n                                      {error,\n                                       {unknown_error,\n                                        <<"indexer rebalance failure - cleanup pending from previous  failed/aborted rebalance/failover/move index. please retry the request later.">>}}}]}}},\n                                 [{service_rebalancer,rebalance_worker,1,\n                                   [{file,"src/service_rebalancer.erl"},\n                                    {line,158}]},\n                                  {proc_lib,init_p,3,\n                                   [{file,"proc_lib.erl"},{line,211}]}]}}}}.\nRebalance Operation Id = 7f9e9cde4bd62a5ce5f0136b669afe44', 'serverTime': '2022-03-21T06:32:58.530Z'}
        [2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.107.235', 'type': 'info', 'code': 1, 'module': 'menelaus_web_sup', 'tstamp': 1647869577227, 'shortText': 'web start ok', 'text': 'Couchbase Server has started on web port 8091 on node \'ns_1@172.23.107.235\'. Version: "7.1.0-2506-enterprise".', 'serverTime': '2022-03-21T06:32:57.227Z'}
        [2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.107.235', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577127, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.107.235' saw that node 'ns_1@172.23.97.145' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.127Z'}
        [2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577124, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.97.145' saw that node 'ns_1@172.23.107.235' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.124Z'}
        [2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.107.235', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577117, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.107.235' saw that node 'ns_1@172.23.106.184' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.117Z'}
        [2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.106.184', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577116, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.106.184' saw that node 'ns_1@172.23.107.235' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.116Z'}
        [2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1647869573470, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.106.184','ns_1@172.23.97.145'], EjectNodes = ['ns_1@172.23.107.235'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 7f9e9cde4bd62a5ce5f0136b669afe44", 'serverTime': '2022-03-21T06:32:53.470Z'}
        [2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.106.184', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1647869560182, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.184' saw that node 'ns_1@172.23.107.235' went down. Details: [{nodedown_reason,\n                                                                                     connection_closed}]", 'serverTime': '2022-03-21T06:32:40.182Z'}
        [2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1647869560177, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.97.145' saw that node 'ns_1@172.23.107.235' went down. Details: [{nodedown_reason,\n                                                                                    connection_closed}]", 'serverTime': '2022-03-21T06:32:40.177Z'}
        [2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1647869559665, 'shortText': 'message', 'text': "Rebalance exited with reason {service_rebalance_failed,index,\n                              {agent_died,<34218.13277.0>,\n                               {lost_connection,\n                                {'ns_1@172.23.107.235',shutdown}}}}.\nRebalance Operation Id = f1f1000d922fe463c3f10862a5cdc580", 'serverTime': '2022-03-21T06:32:39.665Z'} 

       

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            hemant.rajput Hemant Rajput
            hemant.rajput Hemant Rajput
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty