One of the test cases for backup QE is to:
1) Start performing a backup
2) Kill the ns_server erlang process
3) Wait for 'cbbackupmgr' to exit
4) Resume streaming data from the cluster using the '--resume' flag.
At the moment we are seeing cbbackupmgr hang once the erlang process has been killed. When we turn on the debug logging for gocbcore (see 'backup-with-debug-logging-1.log'), we can see that it got the socket read error and tore down the connect, however, this error isn't returned to cbbackupmgr. gocbcore then continues polling the cluster config using CCCP (each time failing with an EOF error).
Since it's not possible to continue streaming a snapshot after the connection has been lost (a new stream must be opened, at which point you are sent a new snapshot marker) when gocbcore detects a socket read/write error this error should be propagated up to the application.
I've provided a couple of different Go files which can be used to reproduce this issue, you may need to modify the hostname/credentials at the top of the file for it to work correctly.
1) Create a single node cluster
2) Create a single bucket called 'default'
3) Load a decent amount of data into the bucket (>= 1000000 documents)
4) Run 'v9.go'
4a) You should see messages about streams closing with 'n' mutations
4b) Kill the 'beam' process (on Linux 'killall -9 beam.smp' will work)
5) You should see the client hang with nothing else happening
If you perform the same set of steps with the 'v7.1.17.go' file you will see that the stream end callback is correctly run with a non-nil error and then the client correctly exits. This is behavior which 'cbbackupgmr' relies on e.g. if there is an EOF error because the connection was closed, the stream end callback should be run notifying us that there was an issue so that we can cleanly exit and inform the user (at which point they can try to resume the backup).
MB-40107 cbbackupmgr may hang when shutting down the connection to the cluster