Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
5.0.0
-
Untriaged
-
No
Description
1. Create a 3 node cluster with 1 bucket
2. Enable Autofailover with 5 second as timeout
3. ssh into any of the node and restart the network service network stop && sleep 5 && service network start
4. Wait for the autofailover to kick in and failover the node.
5. Recreate the cluster again with same autofailover timeout enabled and restart the network of another node
6. We expect the server to be failed over but we see the following in the UI logs
[2017-01-21 01:30:40,139] - [rest_client:2700] ERROR -
{u'node': u'ns_1@172.23.98.79', u'code': 0, u'text': u"IP address seems to have changed. Unable to listen on 'ns_1@172.23.98.79'. (Underlaying POSIX error code: 'eaddrnotavail')", u'shortText': u'message', u'serverTime': u'2017-01-21T01:29:26.276Z', u'module': u'menelaus_web_alerts_srv', u'tstamp': 1484990966276, u'type': u'info'}[2017-01-21 01:30:40,139] - [rest_client:2700] ERROR -
{u'node': u'ns_1@172.23.98.79', u'code': 0, u'text': u"IP address seems to have changed. Unable to listen on 'ns_1@172.23.98.79'. (Underlaying POSIX error code: 'eaddrnotavail') (repeated 8 times)", u'shortText': u'message', u'serverTime': u'2017-01-21T01:29:24.134Z', u'module': u'menelaus_web_alerts_srv', u'tstamp': 1484990964134, u'type': u'info'}You can reproduce the same using automated tests too:
clone testrunner from this repo : https://github.com/bharath-gp/testrunner.git and checkout autofailovertests branch, create an ini file with atleast 4 servers in it (examples in b/resources folder of testrunner)
Run the following tests one after other
./testrunner -i <ini file here> -t failover.AutoFailoverTests.AutoFailoverTests.test_autofailover,timeout=5,num_node_failures=2,pause_between_failover_action=35,failover_action=restart_network,failover_action=restart_network,nodes_init=3
./testrunner -i <ini file here> -t failover.AutoFailoverTests.AutoFailoverTests.test_autofailover,timeout=5,num_node_failures=1,failover_orchestrator=True,failover_action=restart_network,nodes_init=3