Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-1716

[Backup] parsing error in get_repos method

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.1.0
    • 2.1.0
    • None
    • 41: 2.2, Bugs., 45: Portworks, docs, cleanup
    • 2

    Description

      Test : TestFullIncremental

      Job: http://qa.sc.couchbase.com/job/simon-test/306/console

      Error:

      07:08:08     --- FAIL: TestOperator/TestFullIncremental (473.29s)
      07:08:08         crd_util.go:27: creating couchbase cluster: test-couchbase-gk98v
      07:08:08         util.go:1356: time out waiting for backup event BackupStarted, Backup `full-incremental` started
      07:08:08         util.go:1357: goroutine 579 [running]:
      07:08:08             runtime/debug.Stack(0x1c0c7dd, 0x0, 0x0)
      07:08:08             	/jenkins/workspace/simon-test/go/src/runtime/debug/stack.go:24 +0xab
      07:08:08             github.com/couchbase/couchbase-operator/test/e2e/e2eutil.Die(0xc00071c200, 0x258e980, 0xc0002cc360)
      07:08:08             	/jenkins/workspace/simon-test/test/e2e/e2eutil/util.go:1352 +0x34
      07:08:08             github.com/couchbase/couchbase-operator/test/e2e/e2eutil.MustWaitForBackupEvent(0xc00071c200, 0xc000436c60, 0xc000475200, 0xc00035c500, 0x45d964b800)
      07:08:08             	/jenkins/workspace/simon-test/test/e2e/e2eutil/wait_util.go:613 +0xad
      07:08:08             github.com/couchbase/couchbase-operator/test/e2e.TestFullIncremental(0xc00071c200)
      07:08:08             	/jenkins/workspace/simon-test/test/e2e/backup_test.go:109 +0x4d4
      07:08:08             testing.tRunner(0xc00071c200, 0x231edf8)
      07:08:08             	/jenkins/workspace/simon-test/go/src/testing/testing.go:909 +0x19a
      07:08:08             created by testing.(*T).Run
      07:08:08             	/jenkins/workspace/simon-test/go/src/testing/testing.go:960 +0x652
      07:08:08             
      07:08:08 FAIL
      07:08:08 time="2020-10-15T07:08:08-07:00" level=info msg="Test Summary"
      07:08:08 time="2020-10-15T07:08:08-07:00" level=info msg="   1: TestOperator/TestFullIncremental ✗"
      07:08:08 time="2020-10-15T07:08:08-07:00" level=info msg="Suite Summary"
      07:08:08 time="2020-10-15T07:08:08-07:00" level=info msg=" ✗ Failures: 1 (100.00%)"
      07:08:08 FAIL	github.com/couchbase/couchbase-operator/test/e2e	530.805s 

      Image: registry.gitlab.com/cb-vanilla/operator-backup:6.5.1-113

      Server Version: 6.6.0

      S3 Bucket Name: s3://caobackups3

      S3 Region: us-west-2

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          lazy parsing (removing first 2 and last 3 characters) of cbbackupmgr info result meant the script errored out here.
          add try/except loop,
          make sure cbbackupmgr method is converted to str type and escape characters are cleaned up etc

          daniel.ma Daniel Ma (Inactive) added a comment - lazy parsing (removing first 2 and last 3 characters) of cbbackupmgr info result meant the script errored out here. add try/except loop, make sure cbbackupmgr method is converted to str type and escape characters are cleaned up etc

          Reopening the ticket , the S3 backup tests are still failing with the latest image: registry.gitlab.com/cb-vanilla/operator-backup:6.5.1-114

          Job: http://qa.sc.couchbase.com/job/simon-test/345/console

          Server Version: 6.6.0

          Normal backup tests pass with no errors.

          Coming to S3, for each of the failed S3 test, the backup pods generated had the following stack trace in the cbbackupmgr log:

          Traceback (most recent call last):
            File "/opt/couchbase/bin/backup_script", line 1216, in <module>
              main()
            File "/opt/couchbase/bin/backup_script", line 162, in main
              k8s_setup(args, archive)
            File "/opt/couchbase/bin/backup_script", line 472, in k8s_setup
              add_tuple_to_list("backups", list_backups(args, archive), tuple_list)
            File "/opt/couchbase/bin/backup_script", line 291, in list_backups
              repos = get_repos(args, archive)
            File "/opt/couchbase/bin/backup_script", line 833, in get_repos
              json_result = json.loads(result)
            File "/opt/couchbase/lib/python/runtime/lib/python3.7/json/__init__.py", line 348, in loads
              return _default_decoder.decode(s)
            File "/opt/couchbase/lib/python/runtime/lib/python3.7/json/decoder.py", line 337, in decode
              obj, end = self.raw_decode(s, idx=_w(s, 0).end())
            File "/opt/couchbase/lib/python/runtime/lib/python3.7/json/decoder.py", line 355, in raw_decode
              raise JSONDecodeError("Expecting value", s, err.value) from None
          json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) 

          which pointed that backup repo had the value 'None' since it was not created as seen in the same log file:

          2020-10-27 15:35:58,468 - root - INFO - Namespace(backup_ret='720.00', cacert=None, cluster='test-couchbase-qjrsf', config='true', end=None, log_ret='168.00', mode='backup', repo=None, s3_access_key_id='******', s3_bucket='s3://caobackups3', s3_region='us-west-2', s3_secret_access_key='*******', start=None, strategy='full_only', verbosity='INFO')
          2020-10-27 15:35:58,468 - root - INFO - start logRetention check
          2020-10-27 15:35:58,468 - root - INFO - removed 0 logs in /data/scriptlogs/full_only
          2020-10-27 15:35:58,468 - root - INFO - mode: BACKUP
          2020-10-27 15:35:58,469 - root - INFO - Perform CONFIG: new Repo to be created
          2020-10-27 15:35:58,469 - root - INFO - Strategy: full_only
          2020-10-27 15:35:58,469 - root - INFO - Perform FULL BACKUP
          2020-10-27 15:35:58,469 - root - INFO - config true, config needs to be performed
          2020-10-27 15:35:58,469 - root - INFO - attempting to create repo test-couchbase-qjrsf-2020-10-27T15_35_58 in location s3://caobackups3/archive
          2020-10-27 15:35:58,480 - root - INFO - Unknown flag: --obj-staging-dir

          where s3_access_key_id and s3_secret_access_key have the correct values.

          Logged in a VM with couchbase installed, checked the man page of cbbackupmgr , and the config command does require the             --obj-staging-dir flag.

          cbopinfo of all failed S3 tests attached in TestOperator.zip.

          prateek.kumar Prateek Kumar added a comment - Reopening the ticket , the S3 backup tests are still failing with the latest image:  registry.gitlab.com/cb-vanilla/operator-backup:6.5.1-114 Job: http://qa.sc.couchbase.com/job/simon-test/345/console Server Version : 6.6.0 Normal backup tests pass with no errors. Coming to S3, for each of the failed S3 test, the backup pods generated had the following stack trace in the cbbackupmgr log: Traceback (most recent call last): File "/opt/couchbase/bin/backup_script" , line 1216 , in <module> main() File "/opt/couchbase/bin/backup_script" , line 162 , in main k8s_setup(args, archive) File "/opt/couchbase/bin/backup_script" , line 472 , in k8s_setup add_tuple_to_list( "backups" , list_backups(args, archive), tuple_list) File "/opt/couchbase/bin/backup_script" , line 291 , in list_backups repos = get_repos(args, archive) File "/opt/couchbase/bin/backup_script" , line 833 , in get_repos json_result = json.loads(result) File "/opt/couchbase/lib/python/runtime/lib/python3.7/json/__init__.py" , line 348 , in loads return _default_decoder.decode(s) File "/opt/couchbase/lib/python/runtime/lib/python3.7/json/decoder.py" , line 337 , in decode obj, end = self.raw_decode(s, idx=_w(s, 0 ).end()) File "/opt/couchbase/lib/python/runtime/lib/python3.7/json/decoder.py" , line 355 , in raw_decode raise JSONDecodeError( "Expecting value" , s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 ( char 0 ) which pointed that backup repo had the value 'None'  since it was not created as seen in the same log file: 2020 - 10 - 27 15 : 35 : 58 , 468 - root - INFO - Namespace(backup_ret= '720.00' , cacert=None, cluster= 'test-couchbase-qjrsf' , config= 'true' , end=None, log_ret= '168.00' , mode= 'backup' , repo=None, s3_access_key_id= '******' , s3_bucket= 's3://caobackups3' , s3_region= 'us-west-2' , s3_secret_access_key= '*******' , start=None, strategy= 'full_only' , verbosity= 'INFO' ) 2020 - 10 - 27 15 : 35 : 58 , 468 - root - INFO - start logRetention check 2020 - 10 - 27 15 : 35 : 58 , 468 - root - INFO - removed 0 logs in /data/scriptlogs/full_only 2020 - 10 - 27 15 : 35 : 58 , 468 - root - INFO - mode: BACKUP 2020 - 10 - 27 15 : 35 : 58 , 469 - root - INFO - Perform CONFIG: new Repo to be created 2020 - 10 - 27 15 : 35 : 58 , 469 - root - INFO - Strategy: full_only 2020 - 10 - 27 15 : 35 : 58 , 469 - root - INFO - Perform FULL BACKUP 2020 - 10 - 27 15 : 35 : 58 , 469 - root - INFO - config true , config needs to be performed 2020 - 10 - 27 15 : 35 : 58 , 469 - root - INFO - attempting to create repo test-couchbase-qjrsf- 2020 - 10 -27T15_35_58 in location s3: //caobackups3/archive 2020 - 10 - 27 15 : 35 : 58 , 480 - root - INFO - Unknown flag: --obj-staging-dir where s3_access_key_id and s3_secret_access_key have the correct values. Logged in a VM with couchbase installed, checked the man  page of cbbackupmgr , and the config command does require the              --obj-staging-dir flag. cbopinfo of all failed S3 tests attached in TestOperator.zip.

          Prateek Kumar this backup image 6.5.1-114 seems to be using a version of cbbackupmgr that doesn't have S3 capabilities hence the "Unknown flag: --obj-staging-dir" error?

          Thats the only way I can explain the reoccurrence of the error in every single S3 test. And then subsequently we're trying to get_repos and list them from S3 so I believe the json error is because the response is empty.

          daniel.ma Daniel Ma (Inactive) added a comment - Prateek Kumar this backup image 6.5.1-114 seems to be using a version of cbbackupmgr that doesn't have S3 capabilities hence the "Unknown flag: --obj-staging-dir" error? Thats the only way I can explain the reoccurrence of the error in every single S3 test. And then subsequently we're trying to get_repos and list them from S3 so I believe the json error is because the response is empty.

          Prateek Kumar can you please update this with the latest status?

          arunkumar Arunkumar Senthilnathan added a comment - Prateek Kumar can you please update this with the latest status?

          Latest run: http://qa.sc.couchbase.com/job/roo-test/97/console

          All backup test cases pass except one.(TestBackupAndRestoreS3)

          yaml file of the backup pods has the following output:

          output: |-
              Backed up to 2020-11-10T18_27_47.437420134Z
              Copied all data in 35.163142727s (Avg. 2.03KB/Sec)
              1 buckets, 100 items / 71.22KB
              Backup successfully completed
              Backed up bucket "default" succeeded
              Mutations backed up: 100, Mutations failed to backup: 0
              Deletions backed up: 0, Deletions failed to backup: 0
              Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0 

          cbbackupmgr log file of restore pods has the following error:

           2020-11-10 18:29:39,454 - root - ERROR - error output: b"Error restoring cluster: Backup test-couchbase-zlnls-2020-11-10T18_27_18 doesn't contain any backups\n"
          2020-11-10 18:29:39,454 - root - ERROR - cbbackupmgr error during restore action
          2020-11-10 18:29:45,016 - root - ERROR - Exiting Script. Failed with output: cbbackupmgr error during restore action: b"Error restoring cluster: Backup test-couchbase-zlnls-2020-11-10T18_27_18 doesn't contain any backups\n"

          Stack Trace:

          util.go:1358: goroutine 848 [running]:
                       runtime/debug.Stack(0x1cb20d5, 0x2613480, 0xc000f42de0)
                       	/jenkins/workspace/roo-test/go/src/runtime/debug/stack.go:24 +0xab
                       github.com/couchbase/couchbase-operator/test/e2e/e2eutil.Die(0xc00091e800, 0x2610cc0, 0xc0004eb040)
                       	/jenkins/workspace/roo-test/test/e2e/e2eutil/util.go:1353 +0x34
                       github.com/couchbase/couchbase-operator/test/e2e/e2eutil.MustVerifyDocCountInBucket(0xc00091e800, 0xc0005c80c0, 0xc0001a2500, 0x22f2a4c, 0x7, 0x64, 0x45d964b800)
                       	/jenkins/workspace/roo-test/test/e2e/e2eutil/xdcr_util.go:137 +0xb5
                       github.com/couchbase/couchbase-operator/test/e2e.testBackupAndRestore(0xc00091e800, 0xc0004a4f01)
                       	/jenkins/workspace/roo-test/test/e2e/backup_test.go:642 +0xac7
                       github.com/couchbase/couchbase-operator/test/e2e.TestBackupAndRestoreS3(0xc00091e800)
                       	/jenkins/workspace/roo-test/test/e2e/backup_test.go:668 +0x3e
                       testing.tRunner(0xc00091e800, 0x23acc48)
                       	/jenkins/workspace/roo-test/go/src/testing/testing.go:909 +0x19a
                       created by testing.(*T).Run
                      	/jenkins/workspace/roo-test/go/src/testing/testing.go:960 +0x652 

          Image: dmacouch/operator-backup:6.6.0

          cbopinfo (TestBackupAndRestore.zip) of the test case attached.

          prateek.kumar Prateek Kumar added a comment - Latest run : http://qa.sc.couchbase.com/job/roo-test/97/console All backup test cases pass except one.( TestBackupAndRestoreS3 ) yaml file of the backup pods has the following output: output: |- Backed up to 2020 - 11 -10T18_27_47.437420134Z Copied all data in 35 .163142727s (Avg. 2 .03KB/Sec) 1 buckets, 100 items / 71 .22KB Backup successfully completed Backed up bucket "default" succeeded Mutations backed up: 100 , Mutations failed to backup: 0 Deletions backed up: 0 , Deletions failed to backup: 0 Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0 cbbackupmgr log file of restore pods has the following error: 2020 - 11 - 10 18 : 29 : 39 , 454 - root - ERROR - error output: b "Error restoring cluster: Backup test-couchbase-zlnls-2020-11-10T18_27_18 doesn't contain any backups\n" 2020 - 11 - 10 18 : 29 : 39 , 454 - root - ERROR - cbbackupmgr error during restore action 2020 - 11 - 10 18 : 29 : 45 , 016 - root - ERROR - Exiting Script. Failed with output: cbbackupmgr error during restore action: b "Error restoring cluster: Backup test-couchbase-zlnls-2020-11-10T18_27_18 doesn't contain any backups\n" Stack Trace : util.go: 1358 : goroutine 848 [running]: runtime/debug.Stack( 0x1cb20d5 , 0x2613480 , 0xc000f42de0 ) /jenkins/workspace/roo-test/go/src/runtime/debug/stack.go: 24 + 0xab github.com/couchbase/couchbase-operator/test/e2e/e2eutil.Die( 0xc00091e800 , 0x2610cc0 , 0xc0004eb040 ) /jenkins/workspace/roo-test/test/e2e/e2eutil/util.go: 1353 + 0x34 github.com/couchbase/couchbase-operator/test/e2e/e2eutil.MustVerifyDocCountInBucket( 0xc00091e800 , 0xc0005c80c0 , 0xc0001a2500 , 0x22f2a4c , 0x7 , 0x64 , 0x45d964b800 ) /jenkins/workspace/roo-test/test/e2e/e2eutil/xdcr_util.go: 137 + 0xb5 github.com/couchbase/couchbase-operator/test/e2e.testBackupAndRestore( 0xc00091e800 , 0xc0004a4f01 ) /jenkins/workspace/roo-test/test/e2e/backup_test.go: 642 + 0xac7 github.com/couchbase/couchbase-operator/test/e2e.TestBackupAndRestoreS3( 0xc00091e800 ) /jenkins/workspace/roo-test/test/e2e/backup_test.go: 668 + 0x3e testing.tRunner( 0xc00091e800 , 0x23acc48 ) /jenkins/workspace/roo-test/go/src/testing/testing.go: 909 + 0x19a created by testing.(*T).Run /jenkins/workspace/roo-test/go/src/testing/testing.go: 960 + 0x652 Image : dmacouch/operator-backup:6.6.0 cbopinfo  ( TestBackupAndRestore.zip ) of the test case attached.
          daniel.ma Daniel Ma (Inactive) added a comment - - edited

          TestBackupAndRestore error related to K8S-1726, with that fix as well as running with K8S-1643 changes makes the test pass reliably.

          daniel.ma Daniel Ma (Inactive) added a comment - - edited TestBackupAndRestore error related to K8S-1726 , with that fix as well as running with K8S-1643 changes makes the test pass reliably.

          For any new backup errors please open a new ticket instead of re-opening this one, as this ticket ended up being related to multiple other issues.

          https://issues.couchbase.com/browse/K8S-1716?focusedCommentId=444143&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-444143
          Backup script code has been fixed, as mentioned in the comment above.

          daniel.ma Daniel Ma (Inactive) added a comment - For any new backup errors please open a new ticket instead of re-opening this one, as this ticket ended up being related to multiple other issues. https://issues.couchbase.com/browse/K8S-1716?focusedCommentId=444143&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-444143 Backup script code has been fixed, as mentioned in the comment above.

          People

            daniel.ma Daniel Ma (Inactive)
            prateek.kumar Prateek Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty