Description
What is the problem?
When doing a point in time backup on the latest master for both server and backup the backup never completes. It appears to be because we don't count system events (e.g. collections being created) when bumping the highseqno of the sink. This only happens if the last seqno increase in a snapshot is due to a system event.
Technical detail
We see in the logs that we are repeatedly streaming the same snapshot:
> tail -f ~/bk/logs/backup-0.log | grep "vb 941"
|
2023-10-03T08:32:01.683+00:00 (DCP) (default) (vb 941) Creating DCP stream | {"uuid":0,"start_seqno":0,"end_seqno":3,"snap_start":0,"snap_end":0,"retries":0}
|
2023-10-03T08:32:01.766+00:00 (DCP) (default) (vb 941) Stream closed because all items were streamed | {"uuid":74448152038113,"snap_start":0,"snap_end":3,"last_seqno":0,"retries":0}
|
2023-10-03T08:32:01.766+00:00 (DCP) (default) (vb 941) PiTR Streaming next snapshot
|
2023-10-03T08:32:01.766+00:00 (DCP) (default) (vb 941) Creating DCP stream | {"uuid":74448152038113,"start_seqno":0,"end_seqno":3,"snap_start":0,"snap_end":3,"retries":0}
|
2023-10-03T08:32:01.914+00:00 (DCP) (default) (vb 941) Stream closed because all items were streamed | {"uuid":74448152038113,"snap_start":0,"snap_end":3,"last_seqno":0,"retries":0}
|
2023-10-03T08:32:01.916+00:00 (DCP) (default) (vb 941) PiTR Streaming next snapshot
|
As we can see the stream ends with the last_seqno not bumped. By looking at a pcap it appears these seqnos are for system events (i.e. creation of scopes/collections), and couch_dbdump confirms it:
> ../install/bin/couch_dbdump --vbucket ../ns_server/data/n_0/data/default/941.couch.1
|
Dumping "../ns_server/data/n_0/data/default/941.couch.1":
|
Doc seq: 1
|
id: (system-event-key:scope:0x8)_scope
|
rev: 1
|
content_meta: 0x83
|
size (on disk): 48
|
cas: 1696318773518008320, expiry: 0, flags: 16777216, datatype: 0x00 (raw)
|
size: 40
|
data: (snappy)
|
Doc seq: 2
|
id: (system-event-key:collection:0x9)_collection
|
rev: 1
|
content_meta: 0x83
|
size (on disk): 57
|
cas: 1696318773518204928, expiry: 0, flags: 0, datatype: 0x00 (raw)
|
size: 52
|
data: (snappy)
|
Doc seq: 3
|
id: (system-event-key:collection:0x8)_collection
|
rev: 1
|
content_meta: 0x83
|
size (on disk): 60
|
cas: 1696318773518336000, expiry: 0, flags: 0, datatype: 0x00 (raw)
|
size: 64
|
data: (snappy)
|
|
Total docs: 3
|
Looking at the code we can see we open a new stream if we are in PiTR mode and the highseqno of the sink does not match that of the source. This is to work around the fact DCP will not stream all the snapshots in a range for PiTR (MB-46854).
Is this a regression
No. 7.6 just makes this more likely because the first snapshot is likely to have some/all of the _system scope being created.
Reproduction
- Create a bucket on a 7.6 cluster with PiTR enabled
- Create multiple scopes/collections
- Load less than 1024 documents
- Try to create a backup
Attachments
Issue Links
- relates to
-
MB-46854 PiTR: Stream all snapshots
- Closed
For Gerrit Dashboard: MB-58914 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
198382,2 | MB-58914 Handle system events in PiTR restreaming | master | backup | Status: MERGED | +2 | +1 |