Description
Related to K8S-479.
The initial issue is that the K8S operator thinks a rebalance is fine, when it isn't, which we could remedy fairly easily by checking the event streams. We only watch the task status and ensure what we intended to be added/ejected at present. Any better ideas than the event stream would be appreciated.
Now I'm just thinking ahead of the curve here as QE are fairly intolerant of non determinism, so us doing a random number of rebalance attempts will make them upset. Currently our client only talks to the admin service, so polling the eventing API to infer from the stats whether things will work is quite a lot of work for us.
One idea is for ns_server to communicate whether a rebalance would be inhibited so we can just bail out of our reconcile loop. Mike Wiederhold [X] also suggested that ns_server could just wait for eventing to become stable and then kick off the rebalance. Or of course make eventing work during a rebalance.