Details
-
Improvement
-
Resolution: Incomplete
-
Major
-
None
-
Security Level: Public
-
None
Description
Right now, failed nodes are kept in the cluster until rebalance. Because of the "all or nothing" nature of Erlang access control, this leaves us open to all sorts of Byzantine failures, and it means we'd have to add complexity to prevent, say, the orchestrator from running on the failed node. It would be easier, simpler, and far less bug-prone to kick the node out of the cluster and implement the "add back" functionality in a different way, or completely remove the "add back" functionality.