Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
3.0.2
-
Security Level: Public
-
Untriaged
-
Unknown
Description
One node (52-17-12-151) repeatedly suffers net_tick_timeout with multiple different nodes.
However on concluding a node has gone down (due to net_tick_timeout) it then almost immediately sees it again - claiming that it "came up". For example:
[user:warn,2015-03-16T16:34:34.880,ns_1@ec2-52-17-12-151.eu-west-1.compute.amazonaws.com:ns_node_disco<0.4999.0>:ns_node_disco:handle_info:175]Node 'ns_1@ec2-52-17-12-151.eu-west-1.compute.amazonaws.com' saw that node 'ns_1@ec2-52-17-15-202.eu-west-1.compute.amazonaws.com' went down. Details: [
{nodedown_reason, net_tick_timeout}]
[user:info,2015-03-16T16:34:34.887,ns_1@ec2-52-17-12-151.eu-west-1.compute.amazonaws.com:ns_node_disco<0.4999.0>:ns_node_disco:handle_info:169]Node 'ns_1@ec2-52-17-12-151.eu-west-1.compute.amazonaws.com' saw that node 'ns_1@ec2-52-17-15-202.eu-west-1.compute.amazonaws.com' came up. Tags: []
This repeatedly occurs only on this node. The other nodes (e.g. 52-17-15-202) is up and running, and reports as follows:
[user:warn,2015-03-16T16:34:34.896,ns_1@ec2-52-17-15-202.eu-west-1.compute.amazonaws.com:ns_node_disco<0.5208.0>:ns_node_disco:handle_info:175]Node 'ns_1@ec2-52-17-15-202.eu-west-1.compute.amazonaws.com' saw that node 'ns_1@ec2-52-17-12-151.eu-west-1.compute.amazonaws.com' went down. Details: [
{nodedown_reason, connection_closed}]After the node is ejected from the cluster - no more net_tick_timeouts are observed.
UPDATE
=======
Loaded the game-sim sample and created default bucket (then deleted game-sim sample). Leaving the system quiet (i.e no ops) and now starting to get net_tick_timeouts. This time from 52-17-15-193.
[user:warn,2015-03-16T17:32:55.877,ns_1@ec2-52-17-15-193.eu-west-1.compute.amazonaws.com:ns_node_disco<0.5644.0>:ns_node_disco:handle_info:175]Node 'ns_1@ec2-52-17-15-193.eu-west-1.compute.amazonaws.com' saw that node 'ns_1@10.0.0.43' went down. Details: [{nodedown_reason,net_tick_timeout}]
The other node (orchestrator) is up and running and has the corresponding message.
[user:warn,2015-03-16T17:32:55.816,ns_1@10.0.0.43:ns_node_disco<0.4153.0>:ns_node_disco:handle_info:175]Node 'ns_1@10.0.0.43' saw that node 'ns_1@ec2-52-17-15-193.eu-west-1.compute.amazonaws.com' went down. Details: [{nodedown_reason, connection_closed}
]
Issues always appears to be with module core ns_node_disco005, and ns_node_disco004.
Uploaded the logs, see https://s3.amazonaws.com/cb-customers/owend-couchbase/collectinfo-2015-03-16T180110-ns_1%4010.0.0.43.zip
During the collection process additional net_tick_timeouts were seen from 52-17-15-193.