Details
-
Bug
-
Resolution: Duplicate
-
Major
-
2.0-developer-preview-4
-
Security Level: Public
-
None
-
6 centos vms, 4cpu/4gb memory each
Running py-view.conf from jenkins against build 553
Description
In a cluster of 6 nodes and 100k documents, 5 nodes are ejected while 10k documents are being deleted. At the same time, the python client queries the view to trigger re-indexing in the cluster and waits for all changes to be written to disk (waits for ep_queue_size == 0).
However, the stats api returned an error when 'ep_queue_size' was queried, which then caused the test to exit and somehow left 1 node in an unknown state:
ERROR http://10.1.2.31:8091/nodes/self error 404 reason: unknown "Node is unknown to this cluster."
If I attempt to add the node to the cluster it reports:
2012-01-24 18:25:39,492 - root - INFO - adding remote node : 10.1.2.31 to this cluster @ : 10.1.2.30
2012-01-24 18:26:09,558 - root - ERROR - http://10.1.2.30:8091/controller/addNode error 400 reason: unknown ["Prepare join failed. Timeout connecting to \"10.1.2.31\" on port 8091. This could be due to an incorrect host/port combination or a firewall in place between the servers."]
2012-01-24 18:26:09,558 - root - ERROR - add_node error : ["Prepare join failed. Timeout connecting to \"10.1.2.31\" on port 8091. This could be due to an incorrect host/port combination or a firewall in place between the servers."]
This sometimes blocks all remaining tests in the view run list once a node enters this state. Attaching logs from rebalance test case and cluster diags