found whats happened. Thanks, Conor, again very much for reporting it.
Something interesting happened. Both nodes have same initial erlang cookie. And that's causing them to communicate too early in join process. And node that's joining another node gets config from node that's being joined, sees config conflict and picks 'wrong' version. That's causing nodes_wanted with only new node (just joined), this causes original cluster node to leave cluster.
This is very interesting and we haven't seen this before. Have you cloned VM ? Looks like this is possible in EC2 via custom images. And logs kind of confirm that. There is time jump of 8 days before last start of node.
If not that could be due to nodes being launched at same time and not high clock resolution of EC2. Because initial cookie is generated by RNG, but rng is seeded with clock. Erlang itself has microsecond clock precision, but underlying kernel (and in case of Xen, underlying supervisor or Dom0 kernel) does not necessarily supports that. But that seems very unlikely, so I bet on cloning.
In order to correctly fix this issue I need you to confirm that you cloned your VM (or not).
Meanwhile, the following command can be used to re-init cookie of node (don't do that on nodes that are joined to cluster):
O --post-data='NewCookie = ns_node_disco:cookie_gen(), ns_config:set(otp, [
]).' --user=Administrator --password=asdasd http://lh:9000/diag/eval
replace password with your admin password and host:port with your rest host:port (8091 is default port). Doing it on any of nodes prior to joining will likely fix your problem.