Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Security Level: Public
-
None
Description
In testing, we've found the client can end up blocked until the continuious op timeout comes along and kills a dead connection.
Right now, in practice, this "network cable falls out" mode is worse than it needs to be since the handleIO() method on the evented IO loop blocks waiting for something to do. Though it's not 100% clear why, it's believed to be related to all of the caller threads eventually blocking on the "int selected = selector.select(delay);" in that method. Since our continuous timeout threshold is 1000 that means (worst case) 1000*timeout until we dump the connection, which could be several minutes. Ideally, we'd probably push this IO down a layer so data wanting to go to a particular node from a new caller isn't caught up in everything else going on. There may be a simpler fix though, since these should all be non-blocking and the selector.select() should pretty much always come back with something.