Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62816

[CBM] All cbbackupmgr commands fail if version file missing from a single cloud backup

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • Morpheus
    • 7.6.0
    • tools
    • None
    • Untriaged
    • 0
    • Unknown

    Description

      What is the problem?
      If the .version file is missing for any backup in a cloud backup archive, cbbackupmgr collect-logs (tested on version 7.6.0-2176 (8eeaccdf)) fails with:

      Collecting logs from s3://backups/archive failed: could not open backup version file: backup version file at '/data/staging/archive/cb-example-2024-07-19T15_35_35/2024-07-19T15_35_35.880656758Z/.version' not found 

      From my testing, multiple other cbbackupmgr commands (quite possibly all) also fail with the same error. Some where this may cause issues include:

      cbbackupmgr config fails, so you can't create a new backup repository in the same archive.

      cbbackupmgr remove fails, so you can't delete the repository containing the corrupt backup.

      cbbackupmgr info fails. This causes problems for the operator-backup image, which calls this command to list repositories before proceeding to perform a backup, and exits out early when this fails.

      What is the solution?
      This was addressed in MB-59890, where 2 suggested fixes were given:
      1. Always upload the version file to cloud storage
      2. Be tolerant to missing version files

      We opted for solution 1... essentially to try to ensure that this situation won't happen, but we've encountered at least one customer situation where it did anyway (we think due to a partially failed repository deletion). I think we need to consider option 2 for at least some cbbackupmgr commands (see above) - it feels like one corrupt backup in an archive shouldn't make the entire archive completely unusable.

      As a collect-logs command is typically going to be run in a troubleshooting situation, where archive corruption is a possibility, it should definitely be resilient to this, and return as much useful information as possible even if a broken backup exists in the archive.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              owend Daniel Owen
              jack.bakes Jack Bakes
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty