Details
-
Bug
-
Resolution: Fixed
-
Critical
-
2.0-developer-preview-4
-
Security Level: Public
-
None
Description
This is follow-up issue to MB-5367 (synchronous select_bucket). The problem is critical because it shows different wrong behaviors with or without synchronous select_bucket. For tracking purpose, I'd list both here.
1, Async select_bucket
Say, we have notify_bucket msg sent to mccouch, but me-engine timeout waiting for response. It would reset connection and keep waiting for the response in waitForReadable. I guess, the logic was the socket connection had been re-established. However, the problem is that the response handler to notify_update request had been deleted as part of resetConnection. An async select_bucket was sent as part of reset as well. Back in waitForReadable, what would mc-engine receive? It is not response to notify_update but select_bucket. In fact, a response to the original notify_update wouyld never come because the old socket connection had been reset. In this case, at least it would return from the wait and continue. However, the end result is wrong, and back in couch-kvstore, it could abort the system because the callback would have neither success nor etmpfail.
2, Synchronous select_bucket
With MB-5367, select_bucket by itself is not recursive. It would keep re-send the request until succeeded. Back in waitForReadable, there would be no response to come back. Because select_bucket had its own wait-and-resend logic. Then, mc-engine would simply get stuck waiting for nothing to come back.
In short, mc-engine synchronous wait code was incorrect, because no response would ever come back after a connection had been reset. Instead, notify_update, delVBucket, and flush all should re-send requests after reset connection.