We have a number of places and a number of "done-ness" conditions that are not possible to observe with today's API.
Reasons for that are varied. Biggest of them is that changes often has to be applied on all nodes of the cluster and trying to wait for all the nodes to apply changes might cause unbounded delays.
Most widely known problems are:
- bucket deletion. We return success immediately and actual bucket is deleted async-ly. And it's impossible to create new bucket with same name until all reachable nodes (note emphasis, that's pointer to another problem) completed deleting old bucket instance
- bucket flush (via REST API if that matters). It is actually implemented in synchronous and cluster-wide fashion. But if any of nodes is slow to complete failover, then clients may receive "in progress" response without any convenient means to observe it completion (short of polling for some key and observing tmperrors)
- when node is ejected from the cluster it restarts its rest api service. Causing temporary unavailability. I'm not sure it's worth fixing it, but at least we need official and documented answer for this (e.g. "poll it Luke!").
- there's known issue with cluster join request which might timeout (especially if there are lots of concurrent addNode requests to same node) from client's perspective (we return 500 I think). But then "silently" complete (because internally requests are queued to ns_cluster service).
This is considered done when there's request flag that enables the following behavior:
*) if REST API request completes with 200 then it's effect is "done" on all cluster nodes. "Done" for bucket ops (create/flush) should be "available for ops". We'll decide separately if it'll include readiness of moxi.
Various deletions should ideally be also monitored for completion for folks/scripts to be sure that whatever resource consumption deleted thing had, is now freed. But lets leave it out of scope for now. We can add separate request flag for this later. Note however that for testing we'll likely have some ways of monitoring deletions anyways.
Note that any node being unavailable will prevent 200 unless request only applies to specific node(s) that are all available. Yes it will even affect trivial requests like change of compaction settings.
But also note that it doesn't mean that we'll enforce strict (aka linear) consistency of all REST API requests (we might but probably won't). For example trying to change compaction settings to different value on two different nodes at same time, will result in all cluster nodes eventually converging to same settings. But in this case API requests are not required to wait for full convergence.
We might as a bonus provide something like: "request was applied to all nodes but some of the nodes actually decided to accept different version". Another possibility is to have another request option for full linear consistency.
Durability of config settings is another area without strict promise.
So in effect 200 means "we've applied it on all the nodes, but what happen(s/ed) after that is unknown"
*) if REST API request returns non-200 response, then it's not done and will not be done.
- if REST API request returns 202 then there's a standard way to monitor completion or failure of request. It will likely be in a form of some url path to use in polling for completion. NOTE: I need to think a bit more about requests that might get lost due to node unavailability and then suddenly "found". Limits of what we're going to support here is still to be finalized.
- exceptions to this list (like above mentioned node ejection) should be short. All of them with documented and good reason. And all of them with documented way to observe completion of the change.
I will also likely support client-specified timeout or deadline to specify exactly where waiting for 200 should stop and return 202. Also I'll likely support a form of long polling for completion for urls returned from 202. Something like "wait for completion of this task but don't wait longer than 10 seconds. If it succeeds before deadline, give me 200 and if it's still incomplete when 10 secs have passed, give me 202".