Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-3609

Hibernation fails to bring back any pod with error extracting image verion

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.8.0
    • 2.7.0
    • operator
    • None
    • Initial Cluster version : 7.2.2-6401
      Upgrade Cluster version : 7.2.3-6705
      Downgrade Cluster version : 7.2.2-6401
      Kubernetes Version : v1.30.0
      CAO and operator : 2.7.0 built locally
      Environment : Kind cluster
    • 17 -Timetrap
    • 1

    Description

      Cluster Setup

      • Kind cluster locally run on Mac
      • 5 nodes with all services
      • 1 bucket
      • Initial Cluster version : 7.2.2-6401
      • Upgrade Cluster version : 7.2.3-6705
      • Downgrade Cluster version : 7.2.2-6401

      Steps taken in the scenario

      • Created a cluster
      • Created 1 bucket
      • Issued an upgrade from 7.2.2-6401 to 7.2.3-6705 using swap rebalance
      • Swap rebalance for cb-example-0001 with cb-example-0005 completes.
      • Hibernate the cluster.
      • Wake up the cluster with 7.2.2 as image instead of 7.2.3.
      • The cluster is never recovered.

      The operator goes into a loop of 

      {"level":"debug","ts":"2024-08-07T10:45:31Z","logger":"api","msg":"http","cluster":"default/cb-example","method":"GET","url":"http://cb-example-0000.cb-example.default.svc:8091/pools/default","status":"200 OK","time_ms":4.649042}
      {"level":"error","ts":"2024-08-07T10:45:31Z","logger":"cluster","msg":"Failed to update members","cluster":"default/cb-example","error":"error extracting image verion","stacktrace":"github.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:523\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:608\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/controller/controller.go:90\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
      {"level":"error","ts":"2024-08-07T10:45:31Z","logger":"cluster","msg":"Failed to rotate expired certificates","cluster":"default/cb-example","error":"TLS invalid: Attempted to check if certifiates are expired but TLS was never initialized","stacktrace":"github.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:548\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:608\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/controller/controller.go:90\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"} 


      Operator logs : https://cb-engineering.s3.amazonaws.com/K8S-3609/collectinfo-2024-08-07T104800-ns_1%40cb-example-0000.cb-example.default.svc.zip

      Cluster logs : https://cb-engineering.s3.amazonaws.com/K8S-3609/cbopinfo-20240807T161714+0530.tar.gz

      Couchbase deployment : https://cb-engineering.s3.amazonaws.com/K8S-3609/couchbase-cluster.yaml


      The cao tool and operator images were built locally on this commit

      commit f752305ba8574b4464efb7abb009a52a5560fc1b (HEAD -> 2.7.x, origin/2.7.x)
      Author: Yusuf Ramzan <yusuf.ramzan@couchbase.com>
      Date:   Mon Aug 5 14:50:00 2024 +0100    K8S-3598 Fixed not all nodes ready for rebalance
          
          Change-Id: I78e2c3fc76ad8d848e86dac836469e75cfc92683
          Reviewed-on: https://review.couchbase.org/c/couchbase-operator/+/213742
          Tested-by: Build Bot <build@couchbase.com>
          Reviewed-by: <usamah.jassat@couchbase.com>
      

      Attachments

        Activity

          People

            ben.mottershead Ben Mottershead
            raghav.sk Raghav S K
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty