Details
Description
as seen in MB-19610:
CBAuth stale error was caused by slow server restarts
If the time span between components start and menelaus barrier being lifted is more than 5 sec, here's what happens:
components starts and sends revrpc request
request hangs
in 5 sec component gets "stale" error and panicks
component is restarted and sends another revrpc request
and so on
since we do not care about health of the socket while we wait for the barrier, multiple requests from multiple instances of the component get stacked waiting
which creates multiple "stale" messages