Description
Recently we saw a case where nodes left a cluster uncleanly, new nodes were renamed to have the same as the nodes that left uncleanly, and subsequently the config got corrupted. We believe it was a config exchange with one of the nodes that improperly left the cluster, though we can't confirm it as we don't have logs from all the nodes.
We do have some protection against this kind of thing if the node that leaves actually receives the leave instruction, but if not and the nodes are renamed / bounced at an unfortunate time we can be vulnerable to the type of config corruption that was seen.
This ticket is to track making the config exchange more robust even in the face poor timing of node renames / node power cycles when this kind of maneuver is performed.