Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
Untriaged
-
1
-
No
-
KV 2022-Feb
Description
What's the issue?
Whilst testing PiTR related performance, we hit a buffer acknowledgement issue which resulted in our backups failing with an EOF error (e.g. the remote connection being closed).
At this time, if we look at the logs, we see the following:
2022-02-15T17:56:41.311687+00:00 WARNING (default) DCP (Producer) eq_dcpq:cbbackupmgr_2022-02-15T17:56:39Z_85864_0 - Attempting to release 10485794 bytes which is greater than bytesOutstanding:10485706
|
Which indicates that either backup/KV are not accounting correctly regarding buffer acknowledgement.
What's the fix?
PiTR added some new information to the snapshot marker (a timestamp) it appears this isn't being accounted for when incrementing outstanding bytes from the KV side of things.
getMessageSize() |
uint32_t SnapshotMarker::getMessageSize() const {
|
auto rv = baseMsgBytes;
|
if (highCompletedSeqno || maxVisibleSeqno) {
|
rv += sizeof(cb::mcbp::request::DcpSnapshotMarkerV2xPayload) +
|
sizeof(cb::mcbp::request::DcpSnapshotMarkerV2_0Value);
|
} else {
|
rv += sizeof(cb::mcbp::request::DcpSnapshotMarkerV1Payload);
|
}
|
rv += (getStreamId() ? sizeof(cb::mcbp::DcpStreamIdFrameInfo) : 0);
|
return rv;
|
}
|
Looking at the above function, we can see that we're not account for the V2.1 format.
Increased Size |
static_assert(sizeof(DcpSnapshotMarkerV2_0Value) == 36,
|
"Unexpected struct size");
|
...
|
static_assert(sizeof(DcpSnapshotMarkerV2_1Value) == 44,
|
"Unexpected struct size");
|
The fix is to return the correct size for the V2.1 format.