Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: 7.1.0
Component/s: couchbase-bucket
Labels:
None

Story Points:
1

Description

What is the issue?
As far as I know, the majority of DCP clients will try to restart a DCP connection that got unexpectedly closed due to, for example, a failover. This is not the case for the cbbackupmgr and the Backup service.

At this point in time the Backup service doesn't support any kind of recovery or continuation of a backup that had failed due to one of the nodes that it was streaming data from failing over, which results in end user-facing errors that look like this:

"error": "exit status 1: failed to execute cluster operations: failed to execute bucket operation for bucket 'bucket6': failed to transfer bucket data for bucket 'bucket6': failed to transfer key value data: failed to transfer key value data: EOF"

This is not a very informative but is probably the best high-level error we can produce because EOF is the only error we are getting from the Data service, this is what we have in the cbbackupmgr logs:

2021-12-16T02:02:29.575-08:00 WARN: (DCP) (bucket6) (vb 261) Stream closed due to unexpected error 'EOF' | {"uuid":214011845766233,"snap_start":0,"snap_end":16242,"last_seqno":8760,"retries":0} -- couchbase.(*DCPAsyncWorker).End() at dcp_async_worker.go:538

2021-12-16T02:02:29.575-08:00 WARN: (DCP) (bucket6) (vb 261) Received an unexpected error whilst streaming, beginning teardown: EOF -- couchbase.(*DCPAsyncWorker).handleDCPError() at dcp_async_worker.go:615

What is the suggested improvement?
Adding a way of getting a more informative error from the Data service when a DCP stream is closed, not necessarily anything extremely specific (I understand that the Data service might not know that the stream got closed because of a failover as this is not even reflected in the memcached.log) but any error we can use to infer a possible set of reasons for a backup failure in this case and convey them to the end user would be very much welcome.

As a side note, this could also be beneficial to other services as well as they can, for example, use different timeout strategies based on the reason for the stream being closed.

Attachments

Issue Links

relates to

MB-50135 [System Test][CBM] backup task failed with error - failed to transfer key value data: failed to transfer key value data: EOF

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Maksimiljans Januska

Reporter:: Maksimiljans Januska

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Dec/21 10:24 AM

Updated:: 05/Jan/22 7:48 AM

Resolved:: 05/Jan/22 5:57 AM

Gerrit Reviews

There are no open Gerrit changes

Data service chould return a more informative error for the closed backup DCP streams

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty