Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
2.0
-
Security Level: Public
-
Release Note
Description
SUBJ.
In many diags we were seeing we're seeing occasional timeouts here and there. Sometimes and perhaps most of the time they don't affect correct operation of product. After all erlang is famous for it's fault resiliency.
But sometimes it causes rebalance to fail. I.e. see MB-7166 where mb_master which supervised ns_orchestrator which supervised rebalance died due to timeout. Which according to normal error handling behavior of Erlang caused it's restart. But part of restart was shutting down of child processes, including obviously rebalancer.
In my personal experience this is quite easy to hit on physical hardware and spinning disks. But apparently we're now getting in on Xen and SSDs as well as potentially (MB-7152) on physical hardware and SSDs.