Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48338

[CBM] Correctly handle the creation of the '.restore' directory in existing archives

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Test Blocker
    • Resolution: Fixed
    • Neo
    • Neo
    • tools
    • Untriaged
    • 1
    • Yes

    Description

      Our daily analytics runs are failing because of this error.

      Warning: --host is deprecated, use -c/--cluster

      Error restoring cluster: Backup repository is corrupted 'ranges' directory is not in the correct format

      The issue happened in 6.6.4, 7.0.2, and 7.1.

      6.6.4-9911: http://perf.jenkins.couchbase.com/job/triton_analytics/1727/

      7.0.2-6646: http://perf.jenkins.couchbase.com/job/triton_analytics/1731/ 

      7.1.0-1246: http://perf.jenkins.couchbase.com/job/triton_analytics/1729/

       

      These are the last good runs in each release. Note that, the last good runs (job 1725/1726) were using 7.1.0-1242. The issue started in job 1727 with build 6.6.4-9911.

      6.6.4-9910: http://perf.jenkins.couchbase.com/job/triton_analytics/1721/

      7.0.2-6639: http://perf.jenkins.couchbase.com/job/triton_analytics/1723/

      7.1.0-1242: http://perf.jenkins.couchbase.com/job/triton_analytics/1725/ 

       

      I am not able to collect logs because of the same issue.

      cbbackupmgr collect-logs -a /data/analytics/backups -o /tmp

      Collecting logs from /data/analytics/backups failed: Backup repository is corrupted 'ranges' directory is not in the correct format

      Attachments

        1. backup_oceanus.tar.gz
          13.50 MB
        2. backup.tar.gz
          7.86 MB
        3. ls_oceanus.txt
          216 kB
        4. ls.txt
          56 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            bo-chun.wang Bo-Chun Wang added a comment - - edited

            James Lee

            I remove .restore and start a run with build 7.0.2-6643. The run is running, and it is able to restore data without hitting the corrupted issue.

            bo-chun.wang Bo-Chun Wang added a comment - - edited James Lee I remove .restore and start a run with build 7.0.2-6643. The run is running, and it is able to restore data without hitting the corrupted issue.
            james.lee James Lee added a comment -

            Bo-Chun Wang perfect, sounds good; thank you.

            In that case, I'm going to update the affect versions/fix versions to remove everything but 'Neo' since this isn't technically a bug in those releases, it's an issue with us altering existing archives in newer versions. I'll speak to Patrick tomorrow about how we'd best like to handle this.

            Thanks,
            James

            james.lee James Lee added a comment - Bo-Chun Wang perfect, sounds good; thank you. In that case, I'm going to update the affect versions/fix versions to remove everything but ' Neo ' since this isn't technically a bug in those releases, it's an issue with us altering existing archives in newer versions. I'll speak to Patrick tomorrow about how we'd best like to handle this. Thanks, James
            james.lee James Lee added a comment -

            I spoke to Dan and Patrick about this after our stand up today, we're happy that the existing behaviour doesn't need to be changed*. We don't expect this to to a standard use case, we expect users to move forward in versions whilst avoiding mixing-and-matching versions of 'cbbackupmgr' against the same archive i.e. we actively support reading archive created with older versions (but not using lots of different versions against the same archive). Once the final patch has been merged, we'd only expect to see this issue if there's a failure in restore, resulting in the '.restore' directory not being cleaned up (which is the expected behaviour for restore).

            With that being said, I think we should avoid using unstable versions of 'cbbackupmgr' because these tests aren't here to validate the behaviour of 'cbbackupmgr', I'm sure this would go some way to avoid having tests blocked by unstable versions of 'cbbackupmgr' in the future.

            Thanks,
            James

            *Please note that there's one outstanding patch left to be merged which ensures the cleanup of the '.restore' directory after the completion of the restore.

            james.lee James Lee added a comment - I spoke to Dan and Patrick about this after our stand up today, we're happy that the existing behaviour doesn't need to be changed*. We don't expect this to to a standard use case, we expect users to move forward in versions whilst avoiding mixing-and-matching versions of ' cbbackupmgr ' against the same archive i.e. we actively support reading archive created with older versions (but not using lots of different versions against the same archive). Once the final patch has been merged, we'd only expect to see this issue if there's a failure in restore, resulting in the ' .restore ' directory not being cleaned up (which is the expected behaviour for restore). With that being said, I think we should avoid using unstable versions of ' cbbackupmgr ' because these tests aren't here to validate the behaviour of ' cbbackupmgr ', I'm sure this would go some way to avoid having tests blocked by unstable versions of ' cbbackupmgr ' in the future. Thanks, James *Please note that there's one outstanding patch left to be merged which ensures the cleanup of the ' .restore ' directory after the completion of the restore.
            james.lee James Lee added a comment -

            Marking as resolved, the '.restore' directory will now only be left in the event of a failure.

            james.lee James Lee added a comment - Marking as resolved, the ' .restore ' directory will now only be left in the event of a failure.

            I close this issue. I have verified it on build 7.1.0-1288. Moreover, we can do runs with different releases together without hitting the issue.

            7.1.0-1288: http://perf.jenkins.couchbase.com/job/oceanus/6946/

            7.0.2-6643: http://perf.jenkins.couchbase.com/job/oceanus/6947/ 

            bo-chun.wang Bo-Chun Wang added a comment - I close this issue. I have verified it on build 7.1.0-1288. Moreover, we can do runs with different releases together without hitting the issue. 7.1.0-1288:  http://perf.jenkins.couchbase.com/job/oceanus/6946/ 7.0.2-6643: http://perf.jenkins.couchbase.com/job/oceanus/6947/  

            People

              bo-chun.wang Bo-Chun Wang
              bo-chun.wang Bo-Chun Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty