Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 2.1.3
Affects Version/s: 2.0.0
Component/s: library
Labels:
- gocbcore

Story Points:
1

Description

What's the issue?
One of the test cases for backup QE is to:
1) Start performing a backup
2) Kill the ns_server erlang process
3) Wait for 'cbbackupmgr' to exit
4) Resume streaming data from the cluster using the '--resume' flag.

At the moment we are seeing cbbackupmgr hang once the erlang process has been killed. When we turn on the debug logging for gocbcore (see 'backup-with-debug-logging-1.log'), we can see that it got the socket read error and tore down the connect, however, this error isn't returned to cbbackupmgr. gocbcore then continues polling the cluster config using CCCP (each time failing with an EOF error).

What do we expect to see?
Since it's not possible to continue streaming a snapshot after the connection has been lost (a new stream must be opened, at which point you are sent a new snapshot marker) when gocbcore detects a socket read/write error this error should be propagated up to the application.

Steps to reproduce
I've provided a couple of different Go files which can be used to reproduce this issue, you may need to modify the hostname/credentials at the top of the file for it to work correctly.

1) Create a single node cluster
2) Create a single bucket called 'default'
3) Load a decent amount of data into the bucket (>= 1000000 documents)
4) Run 'v9.go'
4a) You should see messages about streams closing with 'n' mutations
4b) Kill the 'beam' process (on Linux 'killall -9 beam.smp' will work)
5) You should see the client hang with nothing else happening

If you perform the same set of steps with the 'v7.1.17.go' file you will see that the stream end callback is correctly run with a non-nil error and then the client correctly exits. This is behavior which 'cbbackupgmr' relies on e.g. if there is an EOF error because the connection was closed, the stream end callback should be run notifying us that there was an issue so that we can cleanly exit and inform the user (at which point they can try to resume the backup).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

backup-with-debug-logging-1.log
97 kB
24/Jun/20 2:08 AM
v7.1.17.go
4 kB
24/Jun/20 2:03 AM
v9.go
5 kB
24/Jun/20 2:03 AM

Issue Links

causes

MB-40107 cbbackupmgr may hang when shutting down the connection to the cluster

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Charles Dixon

Reporter:: James Lee

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 24/Jun/20 2:03 AM

Updated:: 11/Nov/20 3:43 AM

Resolved:: 24/Jun/20 11:12 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 3 closed Gerrit changes

Hide There are 3 closed Gerrit changes

GOCBC-929: Propagate errors to stream handler End: Gerrit Review:

Update gocbcore/v9's SHA to include certain fixes: Gerrit Review:

Update agent_diag.go: Gerrit Review:

gocbcore v9 DCP is failing to propagate EOF/socket read failure errors up to the application

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty