Loading...

Details

Type: Bug
Resolution: User Error
Priority: Critical
Fix Version/s: 7.1.0
Affects Version/s: 7.1.0
Component/s: secondary-index
Labels:
- affects-neo-testing
Environment:
CBS-7.1.0-2506

Triage:
Untriaged
Operating System:
Centos 64-bit
Story Points:
1
Is this a Regression?:
Yes

Description

Rebalance out of failed rebalance-out of GSI node failed with below error. Basically we fail the rebalance out of index node by shutting the node and then restarting it. After restart we retry the rebalance out of same index node. It was supposed to pass but didn't and failed with below error. This test was passing in build 7.1.0-2475

Steps to reproduce:

Create a cluster with 3 nodes with following services kv:n1ql:index-index-index

Load some data and create few indexes with one replica

[2022-03-21 06:31:24,236] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_0` ON default:test_bucket.test_scope_1.test_collection_1(age) USING GSI  WITH {'defer_build': False, 'num_replica': 1}

[2022-03-21 06:31:24,237] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_5` ON default:test_bucket.test_scope_1.test_collection_1(lastName) USING GSI  WITH {'defer_build': False, 'num_replica': 1}

[2022-03-21 06:31:24,238] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_1` ON default:test_bucket.test_scope_1.test_collection_1(city) USING GSI  WITH {'defer_build': False, 'num_replica': 1}

[2022-03-21 06:31:24,240] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_3` ON default:test_bucket.test_scope_1.test_collection_1(title) USING GSI  WITH {'defer_build': False, 'num_replica': 1}

[2022-03-21 06:31:24,241] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_6` ON default:test_bucket.test_scope_1.test_collection_1(streetAddress) USING GSI  WITH {'defer_build': False, 'num_replica': 1}

[2022-03-21 06:31:24,242] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_8` ON default:test_bucket.test_scope_1.test_collection_1(filler1) USING GSI  WITH {'defer_build': False, 'num_replica': 1}

[2022-03-21 06:31:24,243] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_2` ON default:test_bucket.test_scope_1.test_collection_1(country) USING GSI  WITH {'defer_build': False, 'num_replica': 1}

[2022-03-21 06:31:24,244] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_7` ON default:test_bucket.test_scope_1.test_collection_1(suffix) USING GSI  WITH {'defer_build': False, 'num_replica': 1}

[2022-03-21 06:31:24,245] - [tuq_helper:320] INFO - RUN QUERY CREATE INDEX `idx_4` ON default:test_bucket.test_scope_1.test_collection_1(firstName) USING GSI  WITH {'defer_build': False, 'num_replica': 1}

[2022-03-21 06:31:24,246] - [rest_client:4156] INFO - query params :

Start rebalance out of one of the Index-only node (172.23.107.235) and before the rebalance finishes, stop the server on that node to fail the rebalance process.
After rebalance fails (may take 5sec), start the server again.
Check if all the docs are indexed by running query and validating the count

Retry the rebalance out of the same index node(172.23.107.235). Rebalance failed with below error.

[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1647869578530, 'shortText': 'message', 'text': 'Rebalance exited with reason {service_rebalance_failed,index,\n                              {worker_died,\n                               {\'EXIT\',<0.2546.3>,\n                                {{badmatch,\n                                  {error,\n                                   {bad_nodes,index,prepare_rebalance,\n                                    [{\'ns_1@172.23.107.235\',\n                                      {error,\n                                       {unknown_error,\n                                        <<"indexer rebalance failure - cleanup pending from previous  failed/aborted rebalance/failover/move index. please retry the request later.">>}}}]}}},\n                                 [{service_rebalancer,rebalance_worker,1,\n                                   [{file,"src/service_rebalancer.erl"},\n                                    {line,158}]},\n                                  {proc_lib,init_p,3,\n                                   [{file,"proc_lib.erl"},{line,211}]}]}}}}.\nRebalance Operation Id = 7f9e9cde4bd62a5ce5f0136b669afe44', 'serverTime': '2022-03-21T06:32:58.530Z'}

[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.107.235', 'type': 'info', 'code': 1, 'module': 'menelaus_web_sup', 'tstamp': 1647869577227, 'shortText': 'web start ok', 'text': 'Couchbase Server has started on web port 8091 on node \'ns_1@172.23.107.235\'. Version: "7.1.0-2506-enterprise".', 'serverTime': '2022-03-21T06:32:57.227Z'}

[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.107.235', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577127, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.107.235' saw that node 'ns_1@172.23.97.145' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.127Z'}

[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577124, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.97.145' saw that node 'ns_1@172.23.107.235' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.124Z'}

[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.107.235', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577117, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.107.235' saw that node 'ns_1@172.23.106.184' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.117Z'}

[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.106.184', 'type': 'info', 'code': 4, 'module': 'ns_node_disco', 'tstamp': 1647869577116, 'shortText': 'node up', 'text': "Node 'ns_1@172.23.106.184' saw that node 'ns_1@172.23.107.235' came up. Tags: []", 'serverTime': '2022-03-21T06:32:57.116Z'}

[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1647869573470, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.106.184','ns_1@172.23.97.145'], EjectNodes = ['ns_1@172.23.107.235'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 7f9e9cde4bd62a5ce5f0136b669afe44", 'serverTime': '2022-03-21T06:32:53.470Z'}

[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.106.184', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1647869560182, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.184' saw that node 'ns_1@172.23.107.235' went down. Details: [{nodedown_reason,\n                                                                                     connection_closed}]", 'serverTime': '2022-03-21T06:32:40.182Z'}

[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1647869560177, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.97.145' saw that node 'ns_1@172.23.107.235' went down. Details: [{nodedown_reason,\n                                                                                    connection_closed}]", 'serverTime': '2022-03-21T06:32:40.177Z'}

[2022-03-21 06:33:03,538] - [rest_client:3972] ERROR - {'node': 'ns_1@172.23.97.145', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1647869559665, 'shortText': 'message', 'text': "Rebalance exited with reason {service_rebalance_failed,index,\n                              {agent_died,<34218.13277.0>,\n                               {lost_connection,\n                                {'ns_1@172.23.107.235',shutdown}}}}.\nRebalance Operation Id = f1f1000d922fe463c3f10862a5cdc580", 'serverTime': '2022-03-21T06:32:39.665Z'}

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

172.23.106.184-20220321-0633-diag.zip
3.13 MB
21/Mar/22 11:57 PM
172.23.107.230-20220321-0633-diag.zip
1.33 MB
21/Mar/22 11:56 PM
172.23.107.235-20220321-0633-diag.zip
2.73 MB
21/Mar/22 11:56 PM
172.23.107.24-20220321-0633-diag.zip
1.19 MB
21/Mar/22 11:56 PM
172.23.107.247-20220321-0633-diag.zip
1.35 MB
21/Mar/22 11:56 PM
172.23.97.145-20220321-0633-diag.zip
12.40 MB
21/Mar/22 11:57 PM
test.log
151 kB
21/Mar/22 11:56 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Rebalance out of GSI node failed for retry of failed rebalance

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty