Details
Description
steps:
it was system test offline upgrade 2.0.0->2.5
we can skip all steps and to begin the important points, when all the servers were already are 2.5
1) cluster1 with default 6.5M items and sasl 3.5 items:
172.23.105.12
172.23.105.13
172.23.105.14
cluster2 with default 6.5M items and sasl 3.5 items::
172.23.105.17
172.23.105.72
172.23.105.74
XDCR bidirection for all buckets
2)reboot 172.23.105.14 to get autofailover and rebalance out(cluster1)
3)add 172.23.105.14 in cluster2 and start rebalance
4) create views on all clusters
curl -v -X PUT -H 'Content-Type: application/json' 'http://sasl:sasl@172.23.105.12:8092/sasl/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta)
2015 curl -v -X PUT -H 'Content-Type: application/json' 'http://172.23.105.12:8092/default/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta) {emit(doc.city,[doc.name, doc.email]);}
"}, "view1" : {"map":"function(doc, meta)
{emit([doc.category, doc.coins], doc.name);}"}}}'2016 curl -v -X PUT -H 'Content-Type: application/json' 'http://172.23.105.15:8092/default/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta) {emit(doc.city,[doc.name, doc.email]);}"}, "view1" : {"map":"function(doc, meta){emit([doc.category, doc.coins], doc.name);}
"}}}'
2017 curl -v -X PUT -H 'Content-Type: application/json' 'http://sasl:sasl@172.23.105.15:8092/sasl/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta)
"}, "view1" : {"map":"function(doc, meta)
{emit([doc.category, doc.coins], doc.name);}"}}}
5)then stop rebalance manually on cluster2, failover 172.23.105.14(now it's in cluster2) and start rebalance out
6)add 172.23.105.14 back in cluster1 ( rebalance out on cluster2 is still in progress)
result: we are able to add node in cluster1 but rebalance failed on cluster2
logs from clusters with timestamp:
1) Started rebalancing bucket sasl ns_rebalancer000 ns_1@172.23.105.12 07:28:33 - Tue Nov 19, 2013
Starting rebalance, KeepNodes = ['ns_1@172.23.105.12','ns_1@172.23.105.13',
'ns_1@172.23.105.14'], EjectNodes = []
ns_orchestrator004 ns_1@172.23.105.12 07:28:32 - Tue Nov 19, 2013
Node 'ns_1@172.23.105.13' saw that node 'ns_1@172.23.105.14' came up. Tags: [] ns_node_disco004 ns_1@172.23.105.13 07:28:09 - Tue Nov 19, 2013
Node 'ns_1@172.23.105.12' saw that node 'ns_1@172.23.105.14' came up. Tags: [] ns_node_disco004 ns_1@172.23.105.12 07:28:09 - Tue Nov 19, 2013
Started node add transaction by adding node 'ns_1@172.23.105.14' to nodes_wanted (group: 0)
ns_cluster000 ns_1@172.23.105.12 07:28:08 - Tue Nov 19, 201
(rebalance is in progress, but it should be successful)
2)cluster2:
Control connection to memcached on 'ns_1@172.23.105.74' disconnected: {badmatch,
{error,
timeout}} ns_memcached004 ns_1@172.23.105.74 07:31:29 - Tue Nov 19, 2013
Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.30701.306>,
{wait_checkpoint_persisted_failed,"sasl",170,
29,
[{'ns_1@172.23.105.74',
{'EXIT',
{badmatch,{error,timeout,
{gen_server,call,
['ns_memcached-sasl',
,
infinity]}},
{gen_server,call,
[
,
{if_rebalance,<0.29752.306>,
{wait_checkpoint_persisted,170,29}},
infinity]}}}}]}}}
ns_orchestrator002 ns_1@172.23.105.15 07:31:29 - Tue Nov 19, 2013
<0.30304.306> exited with {unexpected_exit,
{'EXIT',<0.30701.306>,
{wait_checkpoint_persisted_failed,"sasl",170,29,
[{'ns_1@172.23.105.74',
{'EXIT',
{badmatch,{error,timeout,
{gen_server,call,
['ns_memcached-sasl',
,
infinity]}},
{gen_server,call,
[
,
{if_rebalance,<0.29752.306>,
{wait_checkpoint_persisted,170,29}},
infinity]}}}}]}}} ns_vbucket_mover000 ns_1@172.23.105.15 07:31:29 - Tue Nov 19, 2013
Bucket "sasl" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.15 07:28:00 - Tue Nov 19, 2013
Node 'ns_1@172.23.105.74' saw that node 'ns_1@172.23.105.14' went down. Details: [
Node 'ns_1@172.23.105.72' saw that node 'ns_1@172.23.105.14' went down. Details: [{nodedown_reason, connection_closed}
] ns_node_disco005 ns_1@172.23.105.72 07:27:58 - Tue Nov 19, 2013
Node 'ns_1@172.23.105.15' saw that node 'ns_1@172.23.105.14' went down. Details: [
] ns_node_disco005 ns_1@172.23.105.15 07:27:58 - Tue Nov 19, 2013
Started rebalancing bucket sasl ns_rebalancer000 ns_1@172.23.105.15 07:27:58 - Tue Nov 19, 2013
Starting rebalance, KeepNodes = ['ns_1@172.23.105.15','ns_1@172.23.105.72',
'ns_1@172.23.105.74'], EjectNodes = []
ns_orchestrator004 ns_1@172.23.105.15 07:27:58 - Tue Nov 19, 2013
Failed over 'ns_1@172.23.105.14': ok ns_orchestrator006 ns_1@172.23.105.15 07:27:44 - Tue Nov 19, 2013
Starting failing over 'ns_1@172.23.105.14' ns_orchestrator000 ns_1@172.23.105.15 07:27:43 - Tue Nov 19, 2013
Rebalance stopped by user.
ns_orchestrator007 ns_1@172.23.105.15 07:27:26 - Tue Nov 19, 2013
Bucket "sasl" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.15 07:23:18 - Tue Nov 19, 2013
Started rebalancing bucket sasl ns_rebalancer000 ns_1@172.23.105.15 07:23:15 - Tue Nov 19, 2013
Starting rebalance, KeepNodes = ['ns_1@172.23.105.15','ns_1@172.23.105.72',
'ns_1@172.23.105.74','ns_1@172.23.105.14'], EjectNodes = []
ns_orchestrator004 ns_1@172.23.105.15 07:23:08 - Tue Nov 19, 2013
the main idea of the bug that I can add node to joun new cluster even the node are still in rebalance out on the other cluster