Details
Description
Setup
1.Setup a 18 node cluster with 2 buckets- bucket1, bucket2
2. Enable auto-failover
3. Add a new node 126
4. Rebalance
Output
1. Rebalance works fine. But seeing these log messages -
Could not automatically failover node 'ns_1@10.3.121.126<ns_1@10.3.121.126><ns_1@10.3.121.126<ns_1@10.3.121.126>>' because I think rebalance is running auto_failover000 ns_1@10.3.2.104<ns_1@10.3.2.104><ns_1@10.3.2.104<ns_1@10.3.2.104>> 19:32:12 - Sun Jun 17, 2012
Bucket "bucket1" loaded on node 'ns_1@10.3.121.126<ns_1@10.3.121.126><ns_1@10.3.121.126<ns_1@10.3.121.126>>' in 0 seconds. ns_memcached001 ns_1@10.3.121.126<ns_1@10.3.121.126><ns_1@10.3.121.126<ns_1@10.3.121.126>> 19:32:04 - Sun Jun 17, 2012
Started rebalancing bucket bucket2 ns_rebalancer000 ns_1@10.3.2.104<ns_1@10.3.2.104><ns_1@10.3.2.104<ns_1@10.3.2.104>> 19:31:36 - Sun Jun 17, 2012
Starting rebalance, KeepNodes = ['ns_1@10.3.2.85<ns_1@10.3.2.85><ns_1@10.3.2.85<ns_1@10.3.2.85>>','ns_1@10.3.2.86<ns_1@10.3.2.86><ns_1@10.3.2.86<ns_1@10.3.2.86>>',
'ns_1@10.3.2.87<ns_1@10.3.2.87><ns_1@10.3.2.87<ns_1@10.3.2.87>>','ns_1@10.3.2.88<ns_1@10.3.2.88><ns_1@10.3.2.88<ns_1@10.3.2.88>>',
'ns_1@10.3.2.89<ns_1@10.3.2.89><ns_1@10.3.2.89<ns_1@10.3.2.89>>','ns_1@10.3.2.104<ns_1@10.3.2.104><ns_1@10.3.2.104<ns_1@10.3.2.104>>',
'ns_1@10.3.2.105<ns_1@10.3.2.105><ns_1@10.3.2.105<ns_1@10.3.2.105>>','ns_1@10.3.2.106<ns_1@10.3.2.106><ns_1@10.3.2.106<ns_1@10.3.2.106>>',
'ns_1@10.3.2.108<ns_1@10.3.2.108><ns_1@10.3.2.108<ns_1@10.3.2.108>>','ns_1@10.3.2.109<ns_1@10.3.2.109><ns_1@10.3.2.109<ns_1@10.3.2.109>>',
'ns_1@10.3.2.110<ns_1@10.3.2.110><ns_1@10.3.2.110<ns_1@10.3.2.110>>','ns_1@10.3.2.111<ns_1@10.3.2.111><ns_1@10.3.2.111<ns_1@10.3.2.111>>',
'ns_1@10.3.2.112<ns_1@10.3.2.112><ns_1@10.3.2.112<ns_1@10.3.2.112>>','ns_1@10.3.2.113<ns_1@10.3.2.113><ns_1@10.3.2.113<ns_1@10.3.2.113>>',
'ns_1@10.3.2.114<ns_1@10.3.2.114><ns_1@10.3.2.114<ns_1@10.3.2.114>>','ns_1@10.3.2.115<ns_1@10.3.2.115><ns_1@10.3.2.115<ns_1@10.3.2.115>>',
'ns_1@10.3.121.126<ns_1@10.3.121.126><ns_1@10.3.121.126<ns_1@10.3.121.126>>'], EjectNodes = []
Attached are the web-logs and logs from master node-104.
https://s3.amazonaws.com/bugdb/jira/web-log-largeCluster/ns-diag-20120618095246.txt
https://s3.amazonaws.com/bugdb/jira/web-log-largeCluster/10.3.2.104-8091-diag.txt.gz
Other related conversation
I have enabled auto-failover on the large-cluster and every time I rebalance In a node, I get an error message showing " Could not automatically failover node 'ns_1@10.3.121.126<ns_1@10.3.121.126><ns_1@10.3.121.126<ns_1@10.3.121.126>>' because I think rebalance is running" .
The node 126 is newly added and rebalance issued, is this message displayed because the node is not yet ready to join the cluster ?
The rebalance works fine, but I do not understand why is auto-failover attempted in here. Any idea?
No. according to logs at 19:32:04 bucket1 was loaded. Maybe there are some other buckets that are still not ready on this node. May I have logs?
Attachments
For Gerrit Dashboard: MB-5602 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
17372,1 | MB-5602: consider buckets' servers list when computing down nodes | branch-181 | ns_server | Status: MERGED | +2 | +1 |
17563,1 | Merge commit '0e6b2f70276f271d08bf1fe46c4b8da528c67c66' into master | master | ns_server | Status: MERGED | +2 | +1 |
17573,1 | Merge remote branch 'origin/branch-181' into branch-18 | branch-18 | ns_server | Status: MERGED | +2 | +1 |