Description
It appears that there is an issue with the gocbcore DCP buffer acknowledgement which is causing collection aware DCP streams to hang (See MB-37775 for more information).
Expected results:
cbbackupmgr should be able to stream all the data from the cluster.
Actual results:
We are seeing cbbackupmgr hang waiting for more data from DCP (gocbcore); we have an activity monitor which will log that a DCP stream has not received any data for a given amount of time (1 Minute by default).
Additional testing:
I've done some small scale testing backing up a single 1 byte item with a buffer ack size of 1 byte and I'm able to reproduce this issue 100% of the time. When using cbstats to view stats about the hung stream we can see:
eq_dcpq:cbbackupmgr_2020-02-04T12:28:23Z_75928_0:supports_ack: true
eq_dcpq:cbbackupmgr_2020-02-04T12:28:23Z_75928_0:stream_1_state: dead
eq_dcpq:cbbackupmgr_2020-02-04T12:28:23Z_75928_0:total_acked_bytes: 0
eq_dcpq:cbbackupmgr_2020-02-04T12:28:23Z_75928_0:unacked_bytes: 28
Steps to reproduce:
1) Build the latest version of Couchbase-Server using the TLM.
2) Create a bucket and load some data (or use a sample bucket).
3) Modify the size of the ack threshold from '8 * 1024 * 1024' to a much smaller value (can reproduce with 8MB, however, it's easier to reproduce with a smaller value).
4) Perform a backup using cbbackupmgr built by the TLM.
5) We should see cbbackupmgr hang waiting for more data from DCP.
Attachments
Issue Links
For Gerrit Dashboard: GOCBC-774 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
121722,8 | GOCBC-774: Fix DCP buffer acknowledgement amount | master | gocbcore | Status: MERGED | +2 | +1 |
121997,2 | Update gocbcore's SHA for cheshire-cat, master builds | master | manifest | Status: MERGED | +2 | +1 |
138470,4 | Update agent_diag.go | master | gocbcore | Status: ABANDONED | -1 | 0 |