Uploaded image for project: 'Couchbase Go SDK'
  1. Couchbase Go SDK
  2. GOCBC-929

gocbcore v9 DCP is failing to propagate EOF/socket read failure errors up to the application

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 2.1.3
    • 2.0.0
    • library
    • 1

    Description

      What's the issue?
      One of the test cases for backup QE is to:
      1) Start performing a backup
      2) Kill the ns_server erlang process
      3) Wait for 'cbbackupmgr' to exit
      4) Resume streaming data from the cluster using the '--resume' flag.

      At the moment we are seeing cbbackupmgr hang once the erlang process has been killed. When we turn on the debug logging for gocbcore (see 'backup-with-debug-logging-1.log'), we can see that it got the socket read error and tore down the connect, however, this error isn't returned to cbbackupmgr. gocbcore then continues polling the cluster config using CCCP (each time failing with an EOF error).

      What do we expect to see?
      Since it's not possible to continue streaming a snapshot after the connection has been lost (a new stream must be opened, at which point you are sent a new snapshot marker) when gocbcore detects a socket read/write error this error should be propagated up to the application.

      Steps to reproduce
      I've provided a couple of different Go files which can be used to reproduce this issue, you may need to modify the hostname/credentials at the top of the file for it to work correctly.

      1) Create a single node cluster
      2) Create a single bucket called 'default'
      3) Load a decent amount of data into the bucket (>= 1000000 documents)
      4) Run 'v9.go'
      4a) You should see messages about streams closing with 'n' mutations
      4b) Kill the 'beam' process (on Linux 'killall -9 beam.smp' will work)
      5) You should see the client hang with nothing else happening

      If you perform the same set of steps with the 'v7.1.17.go' file you will see that the stream end callback is correctly run with a non-nil error and then the client correctly exits. This is behavior which 'cbbackupgmr' relies on e.g. if there is an EOF error because the connection was closed, the stream end callback should be run notifying us that there was an issue so that we can cleanly exit and inform the user (at which point they can try to resume the backup).

      Attachments

        1. v9.go
          5 kB
        2. v7.1.17.go
          4 kB
        3. backup-with-debug-logging-1.log
          97 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              charles.dixon Charles Dixon
              james.lee James Lee
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty