Details
-
Bug
-
Resolution: User Error
-
Critical
-
7.1.0
-
CBS-7.1.0-2506
-
Untriaged
-
Centos 64-bit
-
1
-
Yes
Description
Rebalance out of failed rebalance-out of GSI node failed with below error. Basically we fail the rebalance out of index node by shutting the node and then restarting it. After restart we retry the rebalance out of same index node. It was supposed to pass but didn't and failed with below error. This test was passing in build 7.1.0-2475
Steps to reproduce:
- Create a cluster with 3 nodes with following services kv:n1ql:index-index-index
- Load some data and create few indexes with one replica
[2022-03-21 06:31:24,236] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_0` ON default:test_bucket.test_scope_1.test_collection_1(age) USING GSI WITH {'defer_build': False, 'num_replica': 1}
[2022-03-21 06:31:24,237] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_5` ON default:test_bucket.test_scope_1.test_collection_1(lastName) USING GSI WITH {'defer_build': False, 'num_replica': 1}
[2022-03-21 06:31:24,238] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_1` ON default:test_bucket.test_scope_1.test_collection_1(city) USING GSI WITH {'defer_build': False, 'num_replica': 1}
[2022-03-21 06:31:24,240] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_3` ON default:test_bucket.test_scope_1.test_collection_1(title) USING GSI WITH {'defer_build': False, 'num_replica': 1}
[2022-03-21 06:31:24,241] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_6` ON default:test_bucket.test_scope_1.test_collection_1(streetAddress) USING GSI WITH {'defer_build': False, 'num_replica': 1}
[2022-03-21 06:31:24,242] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_8` ON default:test_bucket.test_scope_1.test_collection_1(filler1) USING GSI WITH {'defer_build': False, 'num_replica': 1}
[2022-03-21 06:31:24,243] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_2` ON default:test_bucket.test_scope_1.test_collection_1(country) USING GSI WITH {'defer_build': False, 'num_replica': 1}
[2022-03-21 06:31:24,244] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_7` ON default:test_bucket.test_scope_1.test_collection_1(suffix) USING GSI WITH {'defer_build': False, 'num_replica': 1}
[2022-03-21 06:31:24,245] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_4` ON default:test_bucket.test_scope_1.test_collection_1(firstName) USING GSI WITH {'defer_build': False, 'num_replica': 1}
[2022-03-21 06:31:24,246] - [rest_client:4156] INFO - query params :
- Start rebalance out of one of the Index-only node (172.23.107.235) and before the rebalance finishes, stop the server on that node to fail the rebalance process.
- After rebalance fails (may take 5sec), start the server again.
- Check if all the docs are indexed by running query and validating the count
- Retry the rebalance out of the same index node(172.23.107.235). Rebalance failed with below error.
[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1647869578530, 'shortText': 'message', 'text': 'Rebalance exited with reason {service_rebalance_failed,index,\n {worker_died,\n {\'EXIT\',<0.2546.3>,\n {{badmatch,\n {error,\n {bad_nodes,index,prepare_rebalance,\n [{\'ns_1@172.23.107.235\',\n {error,\n {unknown_error,\n <<"indexer rebalance failure - cleanup pending from previous failed/aborted rebalance/failover/move index. please retry the request later.">>}}}]}}},\n [{service_rebalancer,rebalance_worker,1,\n [{file,"src/service_rebalancer.erl"},\n {line,158}]},\n {proc_lib,init_p,3,\n [{file,"proc_lib.erl"},{line,211}]}]}}}}.\nRebalance Operation Id = 7f9e9cde4bd62a5ce5f0136b669afe44', 'serverTime': '2022-03-21T06:32:58.530Z'}
[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.107.235', 'type': 'info', 'code': 1, 'module': 'menelaus_web_sup', 'tstamp': 1647869577227, 'shortText': 'web start ok', 'text': 'Couchbase Server has started on web port 8091 on node \'ns_1@172.23.107.235\'. Version: "7.1.0-2506-enterprise".', 'serverTime': '2022-03-21T06:32:57.227Z'}
[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.107.235', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577127, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.107.235' saw that node 'ns_1@172.23.97.145' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.127Z'}
[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577124, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.97.145' saw that node 'ns_1@172.23.107.235' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.124Z'}
[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.107.235', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577117, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.107.235' saw that node 'ns_1@172.23.106.184' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.117Z'}
[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.106.184', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577116, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.106.184' saw that node 'ns_1@172.23.107.235' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.116Z'}
[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1647869573470, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.106.184','ns_1@172.23.97.145'], EjectNodes = ['ns_1@172.23.107.235'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 7f9e9cde4bd62a5ce5f0136b669afe44", 'serverTime': '2022-03-21T06:32:53.470Z'}
[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.106.184', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1647869560182, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.184' saw that node 'ns_1@172.23.107.235' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2022-03-21T06:32:40.182Z'}
[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1647869560177, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.97.145' saw that node 'ns_1@172.23.107.235' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2022-03-21T06:32:40.177Z'}
[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1647869559665, 'shortText': 'message', 'text': "Rebalance exited with reason {service_rebalance_failed,index,\n {agent_died,<34218.13277.0>,\n {lost_connection,\n {'ns_1@172.23.107.235',shutdown}}}}.\nRebalance Operation Id = f1f1000d922fe463c3f10862a5cdc580", 'serverTime': '2022-03-21T06:32:39.665Z'}