I've produced a partial reproduction showing the different behaviour between the operation performed when performing a single document fetch and the operation performed when performing a multiple document fetch.
I blocked connections between memcached and cbq-engine using the tcpkill tool. This was to simulate an attempt on a connection which was 'dead'.
When performing a single document fetch the query instantly failed with the error:
Error performing bulk get operation - cause: read tcp 10.112.183.102:47222->10.112.183.101:11210: read: connection reset by peer
Whereas when performing a multi-document fetch the query kept retrying until the connection eventually timed out.
This is what I had expected to see.
I then upgraded to a 5.5.4 build which currently has this fix implemented and the below error is what was returned:
"Error performing bulk get operation - cause: unable to complete action after 2 attemps"
It appears the number of attempts is defined by the following code maxTries := len(b.Nodes()) * 2. Can someone confirm that this will ensure that the connection is successful and will not hit a dead connection on both attempts?