Details
Description
as seen in MB-19610:
CBAuth stale error was caused by slow server restarts
If the time span between components start and menelaus barrier being lifted is more than 5 sec, here's what happens:
components starts and sends revrpc request
request hangs
in 5 sec component gets "stale" error and panicks
component is restarted and sends another revrpc request
and so on
since we do not care about health of the socket while we wait for the barrier, multiple requests from multiple instances of the component get stacked waiting
which creates multiple "stale" messages
Attachments
Issue Links
For Gerrit Dashboard: MB-19656 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
64195,1 | MB-19656 allow RPCCONNECT requests before the barrier is lifted | sherlock | ns_server | Status: ABANDONED | 0 | +1 |
64257,3 | MB-19656 increase cbauth initialization timeout | sherlock | cbauth | Status: MERGED | +2 | +1 |
64422,1 | Merge remote-tracking branch 'gerrit/sherlock' | master | cbauth | Status: MERGED | +2 | +1 |
66706,2 | MB-20480: Bring cbauth fix for MB-19656 into watson | master | manifest | Status: MERGED | +2 | +1 |