As observed in customer scenarios with multiple buckets, if ns_server attempts to delete Bucket "Del" while Bucket "Roll" is performing a Rollback, the deletion of "Del" can be blocked (and ultimately timed out by ns_server). This can result in failed rebalances from the user's pov.
- All buckets share the same thread pools.
- To delete a bucket, all Tasks associated with that Bucket must be cancelled. That means either (a) letting them finish their current execution if currently running, or (b) if not currently running setting State to DEAD and (briefly) waking them up on their associated threadPool to perform cleanup housekeeping.
If a long-running task from "Roll" (e.g. Rollback) is running on a given thread (e.g. Writer), then the cancellation of Del's Writer Tasks must wait until the Rollback task has finished. While there are normally multiple Writer threads; Rollback typically occurs on many vBuckets simultaneously and hence all the Writer threads could be consumed by Rollback tasks; meaning that no Del tasks can be scheduled to allow bucket deletion to continue.
Unfortunately, Rollback is the highest priority task (not unreasonably), this means that even when the currently running Rollback task finishes, if there are any other Rollback tasks ready to run then they will be scheduled before the Tasks from Del which require cancellation. Only when no Rollback tasks are ready will the Tasks from Del be scheduled.
In one particular customer environment, the Rollback of "Roll" took over 12mins, which meant that deletion of Del took 11mins (the Rollback started ~1min before the delete); which was longer than ns_server was willing to wait and hence bucket deletion was cancelled and rebalance aborted.
(Full details in associated CBSE).
There's a few possible ways to try to improve this, however none of them are both easy and totally effective:
- One approach would be to allow the cancellation run of a task to run at highest priority - in this customer case that would mean the ent_comms_tracking tasks would get a chance to run once the current Rollback tasks finish (we have no way to pre-empt other bucket's tasks); so would speed up the deletion somewhat, but would still be waiting perhaps 60s given all writer threads were consumed by Rollback at the point bucket delete came in.
- Arguably a better approach would be to pre-empt the current running Rollback tasks - and allow the brief cancellation task run to occur. That is difficult at the moment as we only have co-operative scheduling, so all tasks with this possible issue (Rollback, Compaction?) would need to be modified to yield back to scheduler (and let other tasks run). By itself that probably would be insufficient, as Rollback as the highest priority would just run again.
- Alternatively, if we could not require even the brief time on the background thread to cancel, then the Flusher tasks here wouldn't even be blocked by the slower Flusher. Conceptually that is possible, but likely requires significant restructure of the ExecutorPool.