Description
A bug was recently discovered during upgrade testing of Sync Gateway. It manifests as operations failing with 'not connected to a bucket' following an upgraded instance being rebalanced in. Inspecting the logs from the SDK, and a wireshark capture (both of which are available on the linked issue) it appears that the server is returning success for the authentication, but sporadically will not perform automatic bucket selection (note that the username and bucket name must be the same, as this is an upgrade scenario), causing all operations to return errors.
This is none-trivial to workaround in the SDK. The immediately obvious solution would be to signal the connection to be rebuilt when this error is returned, however this would cause all operations which have already been dispatched to be failed immediately. Additionally, it would open the clients to a number of infinite connection rebuild loops.
Additionally, it should be noted that while this issue is currently being discussed in the context of the mobile upgrade scenario failure, it could affect any of our legacy (i.e.: not 5.0-ready) clients during customer upgrade scenarios, as they do not even have awareness of SELECT_BUCKET.
Attachments
Issue Links
- blocks
-
GOCBC-235 gocb errors during server upgrade
- Resolved
-
JCBC-1136 Upgrade from prespock to spock with 2 buckets in prespock cluster fails
- Closed
- relates to
-
MB-13156 Add support for blocking client traffic until server is properly initialized
- Resolved
-
MB-12088 Memcached should return an uninitiated error code
- Closed
-
MB-18199 Only interfaces marked as "management" should be opened by default
- Closed