Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60629

[Backup] Restore fails on locked GCP bucket due to constant retries on 403 response

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      Problem

      In CBSE-16251, a customer moved a cloud backup archive in a normal GCP bucket to a locked GCP bucket (which allows creating new files, but prohibits modifying or deleting pre-existing objects), and then tried to do a restore from that bucket. The restore didn't fail, which would be the expected behaviour, but instead just hanged.

      All the data and services were successfully restored, but at the end of the restore, the sync archive step failsed, as it requires updating the log file, which can't be done with a locked bucket.

      GCP correctly returns a 403 when cbbackupmgr tries to modify/delete an object in a locked bucket. The problem occurs because cbbackupmgr always retries on 403s (here and here) when using a GCP client, since it considers them intermittent.

      Fix

      Regardless of what error code is returned, we shouldn't constantly retry. Ideally we should be able to configure the maximum amount of retries on a per-operation level, but this is not possible. The GCP docs recommend using context timeouts instead, which would require changes to the tools-common cloud package, and backup is already a version behind.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            safian.ali Safian Ali
            safian.ali Safian Ali
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty