Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Incomplete
Priority: Critical
Fix Version/s: 3.0
Affects Version/s: 2.5.0
Component/s: ns_server
Security Level: Public
Labels:
None

Triage:
Triaged
Operating System:
Windows 64-bit

Description

steps:
it was system test offline upgrade 2.0.0->2.5

we can skip all steps and to begin the important points, when all the servers were already are 2.5

1) cluster1 with default 6.5M items and sasl 3.5 items:
172.23.105.12
172.23.105.13
172.23.105.14

cluster2 with default 6.5M items and sasl 3.5 items::
172.23.105.17
172.23.105.72
172.23.105.74

XDCR bidirection for all buckets

2)reboot 172.23.105.14 to get autofailover and rebalance out(cluster1)
3)add 172.23.105.14 in cluster2 and start rebalance
4) create views on all clusters
curl -v -X PUT -H 'Content-Type: application/json' 'http://sasl:sasl@172.23.105.12:8092/sasl/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta)

{emit(doc.city,[doc.name, doc.email]);}"}, "view1" : {"map":"function(doc, meta){emit([doc.category, doc.coins], doc.name);}"}}}'
2015 curl -v -X PUT -H 'Content-Type: application/json' 'http://172.23.105.12:8092/default/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta) {emit(doc.city,[doc.name, doc.email]);}

"}, "view1" : {"map":"function(doc, meta)

{emit([doc.category, doc.coins], doc.name);}"}}}'
2016 curl -v -X PUT -H 'Content-Type: application/json' 'http://172.23.105.15:8092/default/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta) {emit(doc.city,[doc.name, doc.email]);}"}, "view1" : {"map":"function(doc, meta){emit([doc.category, doc.coins], doc.name);}

"}}}'
2017 curl -v -X PUT -H 'Content-Type: application/json' 'http://sasl:sasl@172.23.105.15:8092/sasl/_design/ddoc' -d '{"views": { "view0":{"map":"function(doc, meta)

{emit(doc.city,[doc.name, doc.email]);}

"}, "view1" : {"map":"function(doc, meta)

{emit([doc.category, doc.coins], doc.name);}

"}}}

5)then stop rebalance manually on cluster2, failover 172.23.105.14(now it's in cluster2) and start rebalance out

6)add 172.23.105.14 back in cluster1 ( rebalance out on cluster2 is still in progress)

result: we are able to add node in cluster1 but rebalance failed on cluster2

logs from clusters with timestamp:

1) Started rebalancing bucket sasl ns_rebalancer000 ns_1@172.23.105.12 07:28:33 - Tue Nov 19, 2013
Starting rebalance, KeepNodes = ['ns_1@172.23.105.12','ns_1@172.23.105.13',
'ns_1@172.23.105.14'], EjectNodes = []
ns_orchestrator004 ns_1@172.23.105.12 07:28:32 - Tue Nov 19, 2013
Node 'ns_1@172.23.105.13' saw that node 'ns_1@172.23.105.14' came up. Tags: [] ns_node_disco004 ns_1@172.23.105.13 07:28:09 - Tue Nov 19, 2013
Node 'ns_1@172.23.105.12' saw that node 'ns_1@172.23.105.14' came up. Tags: [] ns_node_disco004 ns_1@172.23.105.12 07:28:09 - Tue Nov 19, 2013
Started node add transaction by adding node 'ns_1@172.23.105.14' to nodes_wanted (group: 0)
ns_cluster000 ns_1@172.23.105.12 07:28:08 - Tue Nov 19, 201

(rebalance is in progress, but it should be successful)

2)cluster2:

Control connection to memcached on 'ns_1@172.23.105.74' disconnected: {badmatch,
{error,
timeout}} ns_memcached004 ns_1@172.23.105.74 07:31:29 - Tue Nov 19, 2013
Rebalance exited with reason {unexpected_exit,
{'EXIT',<0.30701.306>,
{wait_checkpoint_persisted_failed,"sasl",170,
29,
[{'ns_1@172.23.105.74',
{'EXIT',
{badmatch,{error,timeout,
{gen_server,call,
['ns_memcached-sasl',

{wait_for_checkpoint_persistence,670, 30}

,
infinity]}},
{gen_server,call,
[

{'janitor_agent-sasl', 'ns_1@172.23.105.74'}

,
{if_rebalance,<0.29752.306>,
{wait_checkpoint_persisted,170,29}},
infinity]}}}}]}}}
ns_orchestrator002 ns_1@172.23.105.15 07:31:29 - Tue Nov 19, 2013
<0.30304.306> exited with {unexpected_exit,
{'EXIT',<0.30701.306>,
{wait_checkpoint_persisted_failed,"sasl",170,29,
[{'ns_1@172.23.105.74',
{'EXIT',
{badmatch,{error,timeout,
{gen_server,call,
['ns_memcached-sasl',

{wait_for_checkpoint_persistence,670,30}

,
infinity]}},
{gen_server,call,
[

{'janitor_agent-sasl','ns_1@172.23.105.74'}

,
{if_rebalance,<0.29752.306>,
{wait_checkpoint_persisted,170,29}},
infinity]}}}}]}}} ns_vbucket_mover000 ns_1@172.23.105.15 07:31:29 - Tue Nov 19, 2013
Bucket "sasl" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.15 07:28:00 - Tue Nov 19, 2013
Node 'ns_1@172.23.105.74' saw that node 'ns_1@172.23.105.14' went down. Details: [

{nodedown_reason, connection_closed}] ns_node_disco005 ns_1@172.23.105.74 07:27:58 - Tue Nov 19, 2013
Node 'ns_1@172.23.105.72' saw that node 'ns_1@172.23.105.14' went down. Details: [{nodedown_reason, connection_closed}

] ns_node_disco005 ns_1@172.23.105.72 07:27:58 - Tue Nov 19, 2013
Node 'ns_1@172.23.105.15' saw that node 'ns_1@172.23.105.14' went down. Details: [

{nodedown_reason, connection_closed}

] ns_node_disco005 ns_1@172.23.105.15 07:27:58 - Tue Nov 19, 2013
Started rebalancing bucket sasl ns_rebalancer000 ns_1@172.23.105.15 07:27:58 - Tue Nov 19, 2013
Starting rebalance, KeepNodes = ['ns_1@172.23.105.15','ns_1@172.23.105.72',
'ns_1@172.23.105.74'], EjectNodes = []
ns_orchestrator004 ns_1@172.23.105.15 07:27:58 - Tue Nov 19, 2013
Failed over 'ns_1@172.23.105.14': ok ns_orchestrator006 ns_1@172.23.105.15 07:27:44 - Tue Nov 19, 2013
Starting failing over 'ns_1@172.23.105.14' ns_orchestrator000 ns_1@172.23.105.15 07:27:43 - Tue Nov 19, 2013
Rebalance stopped by user.
ns_orchestrator007 ns_1@172.23.105.15 07:27:26 - Tue Nov 19, 2013
Bucket "sasl" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.15 07:23:18 - Tue Nov 19, 2013
Started rebalancing bucket sasl ns_rebalancer000 ns_1@172.23.105.15 07:23:15 - Tue Nov 19, 2013
Starting rebalance, KeepNodes = ['ns_1@172.23.105.15','ns_1@172.23.105.72',
'ns_1@172.23.105.74','ns_1@172.23.105.14'], EjectNodes = []
ns_orchestrator004 ns_1@172.23.105.15 07:23:08 - Tue Nov 19, 2013

the main idea of the bug that I can add node to joun new cluster even the node are still in rebalance out on the other cluster

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Aliaksey Artamonau (Inactive)

Reporter:: Andrei Baranouski

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 19/Nov/13 8:29 AM

Updated:: 23/Sep/16 2:45 PM

Resolved:: 10/Jun/14 3:04 PM

Gerrit Reviews

There are no open Gerrit changes

able to add node in cluster1 that is still in rebalance out after failover on cluster2

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty