Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
6.0.0
-
Untriaged
-
-
Yes
Description
Build
6.0.0-1529
Testcase
./sequoia -scope tests/fts/scope_component_fts.yml -test tests/fts/test_fts_alice_component.yml -provider file:centos_second_cluster.yml -version 6.0.0-1529
Steps:
1. Create a single node kv+fts (.206) cluster
2. Create 2 buckets on it, load 10M docs
3. Create 2 default indexes - scorch and up-side down on the cluster
4. While indexing is on,
add .207(kv), .209(kv+fts), .210(fts), .212(kv+fts)
rebalance - goes through fine.
5. Now create 2 more indexes - scorch and upside_down with custom mapping
6. Add - .215 (fts), .216(kv+fts), .48(kv)
Remove .209 and .212 added in step 4.
Rebalance fails.
7. We then failover .212(request times out) and then rebalance.
Rebalance again fails with buckets_cleanup_failed error as shown below.
Rebalance exited with reason {buckets_cleanup_failed,['ns_1@172.23.96.216']} |
ns_orchestrator 000 |
ns_1@172.23.96.206 |
1:41:49 PM Tue Aug 21, 2018 |
Failed to cleanup old buckets on some nodes: ['ns_1@172.23.96.216'] |
ns_rebalancer 000 |
ns_1@172.23.96.206 |
1:41:49 PM Tue Aug 21, 2018 |
Node 'ns_1@172.23.96.206' saw that node 'ns_1@172.23.96.216' went down. Details: [{nodedown_reason, |
net_tick_timeout}]
|
ns_node_disco 005 |
ns_1@172.23.96.206 |
1:41:49 PM Tue Aug 21, 2018 |
Node 'ns_1@172.23.96.209' saw that node 'ns_1@172.23.96.216' went down. Details: [{nodedown_reason, |
net_tick_timeout}]
|
ns_node_disco 005 |
ns_1@172.23.96.209 |
1:41:47 PM Tue Aug 21, 2018 |
Node 'ns_1@172.23.96.207' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason, |
connection_closed}]
|
ns_node_disco 005 |
ns_1@172.23.96.207 |
1:20:28 AM Tue Aug 21, 2018 |
Node 'ns_1@172.23.96.210' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason, |
connection_closed}]
|
ns_node_disco 005 |
ns_1@172.23.96.210 |
1:20:28 AM Tue Aug 21, 2018 |
Node 'ns_1@172.23.96.209' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason, |
connection_closed}]
|
ns_node_disco 005 |
ns_1@172.23.96.209 |
1:20:28 AM Tue Aug 21, 2018 |
Node 'ns_1@172.23.96.206' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason, |
connection_closed}]
|
ns_node_disco 005 |
ns_1@172.23.96.206 |
1:20:28 AM Tue Aug 21, 2018 |
Node 'ns_1@172.23.96.215' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason, |
connection_closed}]
|
ns_node_disco 005 |
ns_1@172.23.96.215 |
1:20:28 AM Tue Aug 21, 2018 |
Node 'ns_1@172.23.96.48' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason, |
connection_closed}]
|
ns_node_disco 005 |
ns_1@172.23.96.48 |
1:20:28 AM Tue Aug 21, 2018 |
Deleting old data files of bucket "default" |
ns_storage_conf 000 |
ns_1@172.23.96.209 |
1:20:26 AM Tue Aug 21, 2018 |
Deleting old data files of bucket "other" |
ns_storage_conf 000 |
ns_1@172.23.96.209 |
1:20:26 AM Tue Aug 21, 2018 |
Node 'ns_1@172.23.96.212' is leaving cluster. |
ns_cluster 001 |
ns_1@172.23.96.212 |
1:20:26 AM Tue Aug 21, 2018 |
Starting rebalance, KeepNodes = ['ns_1@172.23.96.206','ns_1@172.23.96.207', |
'ns_1@172.23.96.209','ns_1@172.23.96.210', |
'ns_1@172.23.96.215','ns_1@172.23.96.216', |
'ns_1@172.23.96.48'], EjectNodes = [], Failed over and being ejected nodes = ['ns_1@172.23.96.212']; no delta recovery nodes |
Also see some errors from Janitor on .216
Janitor cleanup of "default" failed after failover of ['ns_1@172.23.96.212']: {error, |
{badmatch,
|
{error,
|
{failed_nodes,
|
['ns_1@172.23.96.216']}}}} |