Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2185

Restore Job is not scheduled when Services Flag are used.

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.0
    • operator-backup
    • None
    • 20: PE/Docs/Cleanup
    • 1

    Description

      Restore is implemented with services flags like --disable-data , --disable-eventing etc.

      The test doesn't schedule Couchbasebackuprestore resource when using these flags.

      Job : http://qa.sc.couchbase.com/view/Cloud/job/k8s-cbop-gke-pipeline/164/console

      Backup Image: 1.1.0-100

      The test case here is implemented with --disable-data and we expect no items to be restored. The test case here passes because restore resource was not created hence the expectation and observation match.

      However the restore resource was not scheduled when checked while the test case was running and this is a false positive.

      The logs cannot display any information regarding the same since the test case passes.

      Gerrit patch: http://review.couchbase.org/c/couchbase-operator/+/152328 

      This can be applied and run with the image and as seen in the patch , no huge changes were applied to the test.

      Also, there should have been proper message upon restore not getting scheduled, instead of skipping over it.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Job: http://qa.sc.couchbase.com/view/Cloud/job/k8s-cbop-gke-pipeline/168/console 

          Test Case passes but no restore pod as seen in the logs.

           

          prateek.kumar Prateek Kumar added a comment - Job: http://qa.sc.couchbase.com/view/Cloud/job/k8s-cbop-gke-pipeline/168/console   Test Case passes but no restore pod as seen in the logs.  
          simon.murray Simon Murray added a comment -

          From the logs, what I can tell is the resource was created (create call happens right before the log message). What's meant to happen is the restore successfully completes then deletes the restore/job/pod. Oddly there is no log message for this so I cannot prove it, so I'm guessing it's working as designed and this is testware.

          simon.murray Simon Murray added a comment - From the logs, what I can tell is the resource was created (create call happens right before the log message). What's meant to happen is the restore successfully completes then deletes the restore/job/pod. Oddly there is no log message for this so I cannot prove it, so I'm guessing it's working as designed and this is testware.

          The tests that we have now create restore resource and then we validate the result of that restore through verifying number of docs in the bucket.

          We don't delete the restore pod/job. So if they are getting deleted, they are not doing that by design of tests I think.

          I'll double check in the backup script , but wouldn't we want this restore resource to be reflected in the logs and not delete it ?

           

          prateek.kumar Prateek Kumar added a comment - The tests that we have now create restore resource and then we validate the result of that restore through verifying number of docs in the bucket. We don't delete the restore pod/job. So if they are getting deleted, they are not doing that by design of tests I think. I'll double check in the backup script , but wouldn't we want this restore resource to be reflected in the logs and not delete it ?  
          simon.murray Simon Murray added a comment -

          The operator deletes the restore (so it doesn't do it again, and acts as garbage collection). I've added in logging so we can see this happening.

          We want the logs if it fails (passing is useless spam), so if you think this is failing and the operator sees the restore as succeeding, then we have problems

          simon.murray Simon Murray added a comment - The operator deletes the restore (so it doesn't do it again, and acts as garbage collection). I've added in logging so we can see this happening. We want the logs if it fails (passing is useless spam), so if you think this is failing and the operator sees the restore as succeeding, then we have problems
          simon.murray Simon Murray added a comment -

          Re run with the latest and let us know what you see!

          simon.murray Simon Murray added a comment - Re run with the latest and let us know what you see!

          We don't see tests failing when the restore job/pod has been scheduled successfully. 

          prateek.kumar Prateek Kumar added a comment - We don't see tests failing when the restore job/pod has been scheduled successfully. 

          People

            prateek.kumar Prateek Kumar
            prateek.kumar Prateek Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty