Description
http://qa.hq.northscale.net/job/centos-64-2.0-failover-tests/566/consoleFull
./testrunner -i resources/jenkins/centos-64-5node-failover.ini get-logs=True,GROUP=BAT,get-cbcollect-info=True -t failovertests.FailoverTests.test_failover_firewall,replica=2,items=20000,GROUP=BAT
version=2.0.0-1982-rel
Steps:
1. in cluster enabled firewall on ip:10.1.3.117
2. then wait for 5 minutes, the node becomes unhealthy in /nodeStatuses API
if in case of falling output "iptables --list", UI logs that shows node went down and /nodeStatuses
I saw the same on different versions 2.0.0 & 2.0.1 but filled a bug only after add to test additional logs to identify the problem
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.enable_firewall] enabled firewall on ip:10.1.3.117 port:8091 ssh_username:root
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] running command.raw /sbin/iptables --list
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] command executed successfully
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output] Chain INPUT (policy ACCEPT)
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output] target prot opt source destination
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output] REJECT tcp – anywhere anywhere tcp dpts:cadlock2:60000 reject-with icmp-port-unreachable
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output]
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output] Chain FORWARD (policy ACCEPT)
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output] target prot opt source destination
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output]
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output] Chain OUTPUT (policy ACCEPT)
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output] target prot opt source destination
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output]
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output] Chain RH-Firewall-1-INPUT (0 references)
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [remote_util.log_command_output] target prot opt source destination
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [rest_client.wait_for_node_status] node ns_1@10.1.3.117 status : healthy
2013-03-04 05:40:20 | INFO | MainProcess | MainThread | [rest_client.wait_for_node_status] sleep for 5
seconds before reading the node.status again
....
2013-03-04 05:45:17 | INFO | MainProcess | MainThread | [rest_client.wait_for_node_status] sleep for 5 seconds before reading the node.status again
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [rest_client.wait_for_node_status] node ns_1@10.1.3.117 status_reached : False
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.__init__] connecting to 10.1.3.117 with username : root password : couchbase ssh_key:
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.__init__] Connected
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] running command.raw sudo cat /proc/cpuinfo
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] command executed successfully
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] running command.raw sudo df -Th
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] command executed successfully
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] running command.raw sudo cat /proc/meminfo
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] command executed successfully
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] running command.raw hostname
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] command executed successfully
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] running command.raw /sbin/iptables --list
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.execute_command_raw] command executed successfully
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output] Chain INPUT (policy ACCEPT)
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output] target prot opt source destination
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output] REJECT tcp – anywhere anywhere tcp dpts:cadlock2:60000 reject-with icmp-port-unreachable
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output]
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output] Chain FORWARD (policy ACCEPT)
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output] target prot opt source destination
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output]
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output] Chain OUTPUT (policy ACCEPT)
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output] target prot opt source destination
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output]
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output] Chain RH-Firewall-1-INPUT (0 references)
2013-03-04 05:45:22 | INFO | MainProcess | MainThread | [remote_util.log_command_output] target prot opt source destination
2013-03-04 05:45:23 | ERROR | MainProcess | MainThread | [failovertests.common_test_body]
2013-03-04 05:45:23 | ERROR | MainProcess | MainThread | [failovertests.common_test_body]
{u'node': u'ns_1@10.1.3.118', u'code': 5, u'text': u"Node 'ns_1@10.1.3.118' saw that node 'ns_1@10.1.3.117' went down.", u'shortText': u'node down', u'module': u'ns_node_disco', u'tstamp': 1362404482733, u'type': u'warning'}2013-03-04 05:45:23 | ERROR | MainProcess | MainThread | [failovertests.common_test_body]
{u'node': u'ns_1@10.1.3.117', u'code': 5, u'text': u"Node 'ns_1@10.1.3.117' saw that node 'ns_1@10.1.3.116' went down.", u'shortText': u'node down', u'module': u'ns_node_disco', u'tstamp': 1362404482729, u'type': u'warning'}2013-03-04 05:45:23 | ERROR | MainProcess | MainThread | [failovertests.common_test_body]
{u'node': u'ns_1@10.1.3.117', u'code': 5, u'text': u"Node 'ns_1@10.1.3.117' saw that node 'ns_1@10.1.3.115' went down.", u'shortText': u'node down', u'module': u'ns_node_disco', u'tstamp': 1362404482729, u'type': u'warning'}2013-03-04 05:45:23 | ERROR | MainProcess | MainThread | [failovertests.common_test_body]
{u'node': u'ns_1@10.1.3.117', u'code': 5, u'text': u"Node 'ns_1@10.1.3.117' saw that node 'ns_1@10.1.3.118' went down.", u'shortText': u'node down', u'module': u'ns_node_disco', u'tstamp': 1362404482728, u'type': u'warning'}2013-03-04 05:45:23 | ERROR | MainProcess | MainThread | [failovertests.common_test_body]
{u'node': u'ns_1@10.1.3.116', u'code': 5, u'text': u"Node 'ns_1@10.1.3.116' saw that node 'ns_1@10.1.3.117' went down.", u'shortText': u'node down', u'module': u'ns_node_disco', u'tstamp': 1362404482110, u'type': u'warning'}2013-03-04 05:45:23 | ERROR | MainProcess | MainThread | [failovertests.common_test_body]
{u'node': u'ns_1@10.1.3.114', u'code': 1, u'text': u'Rebalance completed successfully.\n', u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1362404388568, u'type': u'info'}2013-03-04 05:45:23 | ERROR | MainProcess | MainThread | [failovertests.common_test_body]
{u'node': u'ns_1@10.1.3.114', u'code': 0, u'text': u'Bucket "default" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'module': u'ns_vbucket_mover', u'tstamp': 1362404080697, u'type': u'info'}2013-03-04 05:45:23 | ERROR | MainProcess | MainThread | [failovertests.common_test_body]
{u'node': u'ns_1@10.1.3.118', u'code': 1, u'text': u'Bucket "default" loaded on node \'ns_1@10.1.3.118\' in 0 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1362404080100, u'type': u'info'}2013-03-04 05:45:23 | ERROR | MainProcess | MainThread | [failovertests.common_test_body]
{u'node': u'ns_1@10.1.3.117', u'code': 1, u'text': u'Bucket "default" loaded on node \'ns_1@10.1.3.117\' in 0 seconds.', u'shortText': u'message', u'module': u'ns_memcached', u'tstamp': 1362404079864, u'type': u'info'}2013-03-04 05:45:23 | INFO | MainProcess | MainThread | [failovertests.common_test_body] nodeStatuses: {u'10.1.3.114:8091':
{u'status': u'healthy', u'replication': 1.0, u'otpNode': u'ns_1@10.1.3.114'}, u'10.1.3.117:8091':
{u'status': u'healthy', u'replication': 0.0, u'otpNode': u'ns_1@10.1.3.117'}, u'10.1.3.115:8091':
{u'status': u'healthy', u'replication': 1.0, u'otpNode': u'ns_1@10.1.3.115'}, u'10.1.3.118:8091':
{u'status': u'healthy', u'replication': 0.0, u'otpNode': u'ns_1@10.1.3.118'}, u'10.1.3.116:8091': {u'status': u'healthy', u'replication': 0.0, u'otpNode': u'ns_1@10.1.3.116'}}
FAIL
now don't have logs due to:
IMPOSSIBLE TO GRAB CBCOLLECT FROM 10.1.3.118
Traceback (most recent call last):
File "testrunner", line 362, in <module>
xunit.write("
/report-
{1}".format(logs_folder, str_time))
File "lib/xunit.py", line 102, in write
report_xml_file.close()
IOError: [Errno 28] No space left on device