Details
Description
If some nodes are missing at the moment when OOTB CA is being regenerated, those nodes will not be able to connect back to the cluster when connectivity to the missing nodes is restored. This can be treated rather as misconfiguration than a bug, but still it looks like a trap to me.
Ideally we should guarantee that all the nodes receive the new CA in this case. I see two options here:
(a) Use chronicle capabilities: we need special write for chronicle that would guarantee that the update is applied either at all the nodes or not applied at all. My understanding is that we don't currently have such compatibility in chronicle (this needs to be rechecked), and I don't know if it is easy to add it.
(b) We can check if all nodes are present before rotating the OOTB CA cert, but obviously it will not guarantee that new cert is written to all nodes (node can disappear after the check, but before the configuration write). Doing this will protect the customer in 99% of the cases though, so probably it still makes sense to do it if the chronicle option is too hard to implement.
It is also debatable if customer should be able to force rotate the OOTB cert if some nodes are missing. It seems like they should be able to do that at least for backward compat reasons.