Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-1796

[Backup] Failure trying to remove stale lockfile on S3 bucket.

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.1.0
    • 2.1.0
    • None
    • 1

    Description

      Job: http://qa.sc.couchbase.com/job/roo-test/102/console

      Image: dmacouch/operator-backup:6.6.0

      couchbase server: couchbase/server:6.6.0 , server upgrade: couchbase/server:6.6.0

      S3 Bucket: s3://caobackups3

      Tests Failed: TestBackupRetentionS3 , TestUpdateBackupStatusS3

      The job runs 4 tests in parallel.

      cbbackupmgr Error:

      2020-11-23 09:19:14,522 - root - ERROR - cbbackupmgr error during config action
      2020-11-23 09:19:14,522 - root - ERROR - error code: 1, b"Backup repository creation failed: failure trying to remove stale lockfile: the process '54' running on 'full-incremental-full-1606123080-hl9rv' already holds the lock\n"
      2020-11-23 09:19:20,393 - root - INFO - raising backup fail event
      (403)
      Reason: Forbidden
      HTTP response headers: HTTPHeaderDict({'Audit-Id': 'c63addd4-b189-45d2-8482-813dbb29fd09', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Mon, 23 Nov 2020 09:19:20 GMT', 'Content-Length': '411'})
      HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"couchbasebackups.couchbase.com \"cbbackup\" is forbidden: User \"system:serviceaccount:test-vdc5f:couchbase-backup\" cannot get resource \"couchbasebackups\" in API group \"couchbase.com\" in the namespace \"default\"","reason":"Forbidden","details":{"name":"cbbackup","group":"couchbase.com","kind":"couchbasebackups"},"code":403}
      Traceback (most recent call last):
        File "/opt/couchbase/bin/backup_script", line 1220, in config_repo
          result = cbbackupmgr(args_list)
        File "/opt/couchbase/bin/backup_script", line 80, in cbbackupmgr
          return str(subprocess.check_output(["cbbackupmgr"] + cmd).decode("utf-8")).strip()
        File "/opt/couchbase/lib/python/runtime/lib/python3.7/subprocess.py", line 395, in check_output
          **kwargs).stdout
        File "/opt/couchbase/lib/python/runtime/lib/python3.7/subprocess.py", line 487, in run
          output=stdout, stderr=stderr)
      subprocess.CalledProcessError: Command '['cbbackupmgr', 'config', '--archive', 's3://caobackups3/archive', '--repo', 'test-couchbase-xndsq-2020-11-23T09_19_13', '--obj-staging-dir', '/data/staging', '--obj-region', 'us-west-2', '--obj-access-key-id', 'AKIAXQQ2DIGAWSCXJO7I', '--obj-secret-access-key', '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']' returned non-zero exit status 1.During handling of the above exception, another exception occurred:Traceback (most recent call last):
        File "/opt/couchbase/bin/backup_script", line 1234, in <module>
          main()
        File "/opt/couchbase/bin/backup_script", line 153, in main
          config_repo(args, archive, timestamp)
        File "/opt/couchbase/bin/backup_script", line 1230, in config_repo
          % grepexc.output))
        File "/opt/couchbase/bin/backup_script", line 707, in exit_script
          raise_failed_event(get_obj_ref())
        File "/opt/couchbase/bin/backup_script", line 808, in get_obj_ref
          cbbackup = get_backup(BACKUP)
        File "/opt/couchbase/bin/backup_script", line 317, in get_backup
          NAMESPACE, CBBACKUP_PLURAL, name)
        File "/opt/couchbase/lib/python/runtime/lib/python3.7/site-packages/kubernetes/client/api/custom_objects_api.py", line 954, in get_namespaced_custom_object
          (data) = self.get_namespaced_custom_object_with_http_info(group, version, namespace, plural, name, **kwargs)  # noqa: E501
        File "/opt/couchbase/lib/python/runtime/lib/python3.7/site-packages/kubernetes/client/api/custom_objects_api.py", line 1057, in get_namespaced_custom_object_with_http_info
          collection_formats=collection_formats)
        File "/opt/couchbase/lib/python/runtime/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 345, in call_api
          _preload_content, _request_timeout)
        File "/opt/couchbase/lib/python/runtime/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
          _request_timeout=_request_timeout)
        File "/opt/couchbase/lib/python/runtime/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 366, in request
          headers=headers)
        File "/opt/couchbase/lib/python/runtime/lib/python3.7/site-packages/kubernetes/client/rest.py", line 241, in GET
          query_params=query_params)
        File "/opt/couchbase/lib/python/runtime/lib/python3.7/site-packages/kubernetes/client/rest.py", line 231, in request
          raise ApiException(http_resp=r)
      kubernetes.client.rest.ApiException: (403)
      Reason: Forbidden
      HTTP response headers: HTTPHeaderDict({'Audit-Id': '93f140b6-94a0-4aeb-bbb4-62d157a2f136', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Mon, 23 Nov 2020 09:19:20 GMT', 'Content-Length': '411'})
      HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"couchbasebackups.couchbase.com \"cbbackup\" is forbidden: User \"system:serviceaccount:test-vdc5f:couchbase-backup\" cannot get resource \"couchbasebackups\" in API group \"couchbase.com\" in the namespace \"default\"","reason":"Forbidden","details":{"name":"cbbackup","group":"couchbase.com","kind":"couchbasebackups"},"code":403} 

      cbopinfo logs of both failures attached. 

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Daniel Ma we are waiting on this bug to declare a RC - please take a look

          arunkumar Arunkumar Senthilnathan (Inactive) added a comment - Daniel Ma we are waiting on this bug to declare a RC - please take a look

          One think to be aware of is this error might be expected if multiple cbbackupmgr are running at the same time using the same location (S3 Bucket). James Lee Noticed that the description mentions "4 tests in parallel" which might be tricking this.

          pvarley Patrick Varley added a comment - One think to be aware of is this error might be expected if multiple cbbackupmgr are running at the same time using the same location (S3 Bucket). James Lee Noticed that the description mentions "4 tests in parallel" which might be tricking this.

          All backup tests when triggered serially passes.

          http://qa.sc.couchbase.com/job/roo-test/103/consoleFull

          Image: registry.gitlab.com/cb-vanilla/operator-backup:6.6.0-100

          couchbase server: couchbase/server:6.6.0 , server upgrade: couchbase/server:6.6.0

          S3 Bucket: s3://caobackups3

          prateek.kumar Prateek Kumar (Inactive) added a comment - All backup tests when triggered serially passes. http://qa.sc.couchbase.com/job/roo-test/103/consoleFull Image : registry.gitlab.com/cb-vanilla/operator-backup:6.6.0-100 couchbase server : couchbase/server:6.6.0 ,  server upgrade : couchbase/server:6.6.0 S3 Bucket : s3://caobackups3

          Tested with new backup image.

          prateek.kumar Prateek Kumar (Inactive) added a comment - Tested with new backup image.

          As discussed with Prateek, cbbackupmgr running in parallel was causing the issue just to reaffirm Patrick's comment.

          daniel.ma Daniel Ma (Inactive) added a comment - As discussed with Prateek, cbbackupmgr running in parallel was causing the issue just to reaffirm Patrick's comment.

          People

            daniel.ma Daniel Ma (Inactive)
            prateek.kumar Prateek Kumar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty