Description
What is the problem?
If the .version file is missing for any backup in a cloud backup archive, cbbackupmgr collect-logs (tested on version 7.6.0-2176 (8eeaccdf)) fails with:
Collecting logs from s3://backups/archive failed: could not open backup version file: backup version file at '/data/staging/archive/cb-example-2024-07-19T15_35_35/2024-07-19T15_35_35.880656758Z/.version' not found |
From my testing, multiple other cbbackupmgr commands (quite possibly all) also fail with the same error. Some where this may cause issues include:
cbbackupmgr config fails, so you can't create a new backup repository in the same archive.
cbbackupmgr remove fails, so you can't delete the repository containing the corrupt backup.
cbbackupmgr info fails. This causes problems for the operator-backup image, which calls this command to list repositories before proceeding to perform a backup, and exits out early when this fails.
What is the solution?
This was addressed in MB-59890, where 2 suggested fixes were given:
1. Always upload the version file to cloud storage
2. Be tolerant to missing version files
We opted for solution 1... essentially to try to ensure that this situation won't happen, but we've encountered at least one customer situation where it did anyway (we think due to a partially failed repository deletion). I think we need to consider option 2 for at least some cbbackupmgr commands (see above) - it feels like one corrupt backup in an archive shouldn't make the entire archive completely unusable.
As a collect-logs command is typically going to be run in a troubleshooting situation, where archive corruption is a possibility, it should definitely be resilient to this, and return as much useful information as possible even if a broken backup exists in the archive.