Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.3.0
    • testing
    • None

    Attachments

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

      Activity

        simon.murray Simon Murray added a comment -

        Coming from 2.x to 2.3....

        • When using server groups, all PVCs will have their failure-domain.beta.kubernetes.io/zone annotation updated to topology.kubernetes.io/zone.  After said upgrade, you should be able to kill pods and they will be resceduled int the correct AZ.
        • All volume backed pods will acquire a pvc.couchbase.com/image annotation.  Killing a pod during an upgrade will use the image specified by this for recovery and not the upgrade version from the CRD.
        simon.murray Simon Murray added a comment - Coming from 2.x to 2.3.... When using server groups, all PVCs will have their failure-domain.beta.kubernetes.io/zone annotation updated to topology.kubernetes.io/zone .  After said upgrade, you should be able to kill pods and they will be resceduled int the correct AZ. All volume backed pods will acquire a pvc.couchbase.com/image annotation.  Killing a pod during an upgrade will use the image specified by this for recovery and not the upgrade version from the CRD.

        Coming from 2.2.0 to 2.3.0:

        • Create DAC, Operator pod with 2.2.0. Also create a 3 node CB cluster running 6.6.2 defined as: 

          ---
          apiVersion: couchbase.com/v2
          kind: CouchbaseCluster
          metadata:
            name: cb-example
          spec:
            image: couchbase/server:6.6.2
            security:
              adminSecret: cb-example-auth
            serverGroups:
              - us-east1-b
              - us-east1-c
              - us-east1-d
            buckets:
              managed: true
            servers:
            - size: 3
              name: kv
              services:
              - data
              - index
              - query
              - analytics
              - eventing
              - search
              volumeMounts:
                default: couchbase
              serverGroups:
                  - us-east1-b
                  - us-east1-c
            volumeClaimTemplates:
            - metadata:
                name: couchbase
              spec:
                storageClassName: standard-rwo
                resources:
                  requests:
                    storage: 5Gi
                  limits:
                    storage: 10Gi 

        • Status of the 2.2.0 pods: 

          NAME                                            READY   STATUS    RESTARTS   AGE
          cb-example-0000                                 1/1     Running   0          3m38s
          cb-example-0001                                 1/1     Running   0          2m37s
          cb-example-0002                                 1/1     Running   0          92s
          couchbase-operator-84958c67bb-r6mbf             1/1     Running   0          4m30s
          couchbase-operator-admission-7877745899-wvwmh   1/1     Running   0          4m38s 

        • pods/cb-example-0000 annotations: 

          Annotations:  operator.couchbase.com/version: 2.2.0
                        pod.couchbase.com/spec:
                          {"volumes":[{"name":"cb-example-0000-default-01","persistentVolumeClaim":{"claimName":"cb-example-0000-default-01"}}],"containers":[{"name...
                        server.couchbase.com/version: 6.6.2 

        • pods/cb-example-0000 NodeSelector: 

          Node-Selectors:              failure-domain.beta.kubernetes.io/zone=us-east1-b 

        • pvc/cb-example-0000-default-01 annotations: 

          Annotations:   failure-domain.beta.kubernetes.io/zone: us-east1-b
                         operator.couchbase.com/version: 2.2.0
                         path: /opt/couchbase/var/lib/couchbase
                         pv.beta.kubernetes.io/gid: 1000
                         pv.kubernetes.io/bind-completed: yes
                         pv.kubernetes.io/bound-by-controller: yes
                         pvc.couchbase.com/spec: {"resources":{"limits":{"storage":"10Gi"},"requests":{"storage":"5Gi"}},"storageClassName":"standard-rwo"}
                         server.couchbase.com/version: 6.6.2
                         serverConfig: kv
                         volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
                         volume.kubernetes.io/selected-node: gke-cluster-1-default-pool-45831c5b-whwz 

        • Upgrade Operator to 2.3.0 by following standard procedure. All running CB pods are replaced automatically by equivalent pods with new annotations.
        • Status of the pods post-upgrade: 

          NAME                                            READY   STATUS    RESTARTS   AGE
          cb-example-0003                                 1/1     Running   0          6m27s
          cb-example-0004                                 1/1     Running   0          4m34s
          cb-example-0005                                 1/1     Running   0          2m30s
          couchbase-operator-5b4cb9f599-gcbp7             1/1     Running   0          7m37s
          couchbase-operator-admission-65469748f6-r2k8j   1/1     Running   0          7m58s 

        • pods/cb-example-0003 annotations: 

          Annotations:  operator.couchbase.com/version: 2.3.0
                        pod.couchbase.com/initialized: true
                        pod.couchbase.com/spec:
                          {"volumes":[{"name":"cb-example-0003-default-00","persistentVolumeClaim":{"claimName":"cb-example-0003-default-00"}}],"containers":[{"name...
                        prometheus.io/path: /metrics
                        prometheus.io/port: 9091
                        prometheus.io/scrape: false
                        server.couchbase.com/version: 6.6.2 

        • pods/cb-example-0003 NodeSelector: 

          Node-Selectors:              topology.kubernetes.io/zone=us-east1-b 

        • pvc/cb-example-0003-default-01 annotations: 

          Annotations:   operator.couchbase.com/version: 2.3.0
                         path: /opt/couchbase/var/lib/couchbase
                         pv.beta.kubernetes.io/gid: 1000
                         pv.kubernetes.io/bind-completed: yes
                         pv.kubernetes.io/bound-by-controller: yes
                         pvc.couchbase.com/image: couchbase/server:6.6.2
                         pvc.couchbase.com/spec: {"resources":{"limits":{"storage":"10Gi"},"requests":{"storage":"5Gi"}},"storageClassName":"standard-rwo"}
                         server.couchbase.com/version: 6.6.2
                         serverConfig: kv
                         topology.kubernetes.io/zone: us-east1-b
                         volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
                         volume.kubernetes.io/selected-node: gke-cluster-1-default-pool-45831c5b-whwz 

        • Now upgrade server version from 6.6.2 to 7.0.3
        • Status of the pods post server upgrade: 

          NAME                                            READY   STATUS    RESTARTS   AGE
          cb-example-0006                                 1/1     Running   0          9m44s
          cb-example-0007                                 1/1     Running   0          7m18s
          cb-example-0008                                 1/1     Running   0          5m5s
          couchbase-operator-5b4cb9f599-gcbp7             1/1     Running   0          19m
          couchbase-operator-admission-65469748f6-r2k8j   1/1     Running   0          19m 

        • pvc/cb-example-0006-default-01 annotations: 

          Annotations:   operator.couchbase.com/version: 2.3.0
                         path: /opt/couchbase/var/lib/couchbase
                         pv.beta.kubernetes.io/gid: 1000
                         pv.kubernetes.io/bind-completed: yes
                         pv.kubernetes.io/bound-by-controller: yes
                         pvc.couchbase.com/image: couchbase/server:7.0.3
                         pvc.couchbase.com/spec: {"resources":{"limits":{"storage":"10Gi"},"requests":{"storage":"5Gi"}},"storageClassName":"standard-rwo"}
                         server.couchbase.com/version: 7.0.3
                         serverConfig: kv
                         topology.kubernetes.io/zone: us-east1-b
                         volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
                         volume.kubernetes.io/selected-node: gke-cluster-1-default-pool-45831c5b-p8n1 

        • Kill all 3 CB pods: 

          Prateeks-MacBook-Pro:Downloads prateekkumar$ kubectl delete pods cb-example-0006 cb-example-0007 cb-example-0008
          pod "cb-example-0006" deleted
          pod "cb-example-0007" deleted
          pod "cb-example-0008" deleted 

        •  Pod cb-example-0006 comes up instantly in Warmup State while 0007 and 0008 are down: 

          {"level":"info","ts":1648040462.1173003,"logger":"cluster","msg":"Node status","cluster":"default/cb-example","name":"cb-example-0006","version":"7.0.3","class":"kv","managed":true,"status":"Warmup"}
          {"level":"info","ts":1648040462.1173072,"logger":"cluster","msg":"Node status","cluster":"default/cb-example","name":"cb-example-0007","version":"7.0.3","class":"kv","managed":true,"status":"Down"}
          {"level":"info","ts":1648040462.1173117,"logger":"cluster","msg":"Node status","cluster":"default/cb-example","name":"cb-example-0008","version":"7.0.3","class":"kv","managed":true,"status":"Down"} 

        • Autofailover time period is set as 2m by default, so we get log messages of Operator waiting for pod failover: 

          {"level":"info","ts":1648040468.1909335,"logger":"cluster","msg":"Reconciliation failed","cluster":"default/cb-example","error":"reconcile was blocked from running: waiting for pod failover","stack":"github.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).handleDownNodes\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:423\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).exec\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:307\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcileMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:260\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:172\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:481\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:524\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/controller/controller.go:90\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227"} 

        • After 2m, all pods come up in the same server group with same annotations as before.

        Verified:

        • Upgrade from 2.2.0 to 2.3.0
        • Upgrade from 6.6.2 to 7.0.3 post Operator upgrade.
        • failure-domain.beta.kubernetes.io/zone annotation of PVC and Nodeselector of Pod updated to topology.kubernetes.io/zone after operator upgrade.
        • All pods were rescheduled in the correct AZ after they were killed.
        • All PVC acquire pvc.couchbase.com/image annotation after operator upgrade.
        prateek.kumar Prateek Kumar (Inactive) added a comment - Coming from 2.2.0 to 2.3.0: Create DAC, Operator pod with 2.2.0. Also create a 3 node CB cluster running 6.6.2 defined as:  --- apiVersion: couchbase.com/v2 kind: CouchbaseCluster metadata:   name: cb-example spec:   image: couchbase/server: 6.6 . 2   security:     adminSecret: cb-example-auth   serverGroups:     - us-east1-b     - us-east1-c     - us-east1-d   buckets:     managed: true   servers:   - size: 3     name: kv     services:     - data     - index     - query     - analytics     - eventing     - search     volumeMounts:       default : couchbase     serverGroups:         - us-east1-b         - us-east1-c   volumeClaimTemplates:   - metadata:       name: couchbase     spec:       storageClassName: standard-rwo       resources:         requests:           storage: 5Gi         limits:           storage: 10Gi Status of the 2.2.0 pods:  NAME                                            READY   STATUS    RESTARTS   AGE cb-example- 0000                                 1 / 1     Running   0           3m38s cb-example- 0001                                 1 / 1     Running   0           2m37s cb-example- 0002                                 1 / 1     Running   0           92s couchbase-operator-84958c67bb-r6mbf             1 / 1     Running   0           4m30s couchbase-operator-admission- 7877745899 -wvwmh   1 / 1     Running   0           4m38s pods/cb-example-0000 annotations:  Annotations:  operator.couchbase.com/version: 2.2 . 0               pod.couchbase.com/spec:                 { "volumes" :[{ "name" : "cb-example-0000-default-01" , "persistentVolumeClaim" :{ "claimName" : "cb-example-0000-default-01" }}], "containers" :[{"name...               server.couchbase.com/version: 6.6 . 2 pods/cb-example-0000 NodeSelector:  Node-Selectors:              failure-domain.beta.kubernetes.io/zone=us-east1-b pvc/cb-example-0000-default-01 annotations:  Annotations:   failure-domain.beta.kubernetes.io/zone: us-east1-b                operator.couchbase.com/version: 2.2 . 0                path: /opt/couchbase/var/lib/couchbase                pv.beta.kubernetes.io/gid: 1000                pv.kubernetes.io/bind-completed: yes                pv.kubernetes.io/bound-by-controller: yes                pvc.couchbase.com/spec: { "resources" :{ "limits" :{ "storage" : "10Gi" }, "requests" :{ "storage" : "5Gi" }}, "storageClassName" : "standard-rwo" }                server.couchbase.com/version: 6.6 . 2                serverConfig: kv                volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io                volume.kubernetes.io/selected-node: gke-cluster- 1 - default -pool-45831c5b-whwz Upgrade Operator to 2.3.0 by following standard procedure. All running CB pods are replaced automatically by equivalent pods with new annotations. Status of the pods post-upgrade:  NAME                                            READY   STATUS    RESTARTS   AGE cb-example- 0003                                 1 / 1     Running   0           6m27s cb-example- 0004                                 1 / 1     Running   0           4m34s cb-example- 0005                                 1 / 1     Running   0           2m30s couchbase-operator-5b4cb9f599-gcbp7             1 / 1     Running   0           7m37s couchbase-operator-admission-65469748f6-r2k8j   1 / 1     Running   0           7m58s pods/cb-example-0003 annotations:  Annotations:  operator.couchbase.com/version: 2.3 . 0               pod.couchbase.com/initialized: true               pod.couchbase.com/spec:                 { "volumes" :[{ "name" : "cb-example-0003-default-00" , "persistentVolumeClaim" :{ "claimName" : "cb-example-0003-default-00" }}], "containers" :[{"name...               prometheus.io/path: /metrics               prometheus.io/port: 9091               prometheus.io/scrape: false               server.couchbase.com/version: 6.6 . 2 pods/cb-example-0003 NodeSelector:  Node-Selectors:              topology.kubernetes.io/zone=us-east1-b pvc/cb-example-0003-default-01 annotations:  Annotations:   operator.couchbase.com/version: 2.3 . 0                path: /opt/couchbase/var/lib/couchbase                pv.beta.kubernetes.io/gid: 1000                pv.kubernetes.io/bind-completed: yes                pv.kubernetes.io/bound-by-controller: yes                pvc.couchbase.com/image: couchbase/server: 6.6 . 2                pvc.couchbase.com/spec: { "resources" :{ "limits" :{ "storage" : "10Gi" }, "requests" :{ "storage" : "5Gi" }}, "storageClassName" : "standard-rwo" }                server.couchbase.com/version: 6.6 . 2                serverConfig: kv                topology.kubernetes.io/zone: us-east1-b                volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io                volume.kubernetes.io/selected-node: gke-cluster- 1 - default -pool-45831c5b-whwz Now upgrade server version from 6.6.2 to 7.0.3 Status of the pods post server upgrade:  NAME                                            READY   STATUS    RESTARTS   AGE cb-example- 0006                                 1 / 1     Running   0           9m44s cb-example- 0007                                 1 / 1     Running   0           7m18s cb-example- 0008                                 1 / 1     Running   0           5m5s couchbase-operator-5b4cb9f599-gcbp7             1 / 1     Running   0           19m couchbase-operator-admission-65469748f6-r2k8j   1 / 1     Running   0           19m pvc/cb-example-0006-default-01 annotations:  Annotations:   operator.couchbase.com/version: 2.3 . 0                path: /opt/couchbase/var/lib/couchbase                pv.beta.kubernetes.io/gid: 1000                pv.kubernetes.io/bind-completed: yes                pv.kubernetes.io/bound-by-controller: yes                pvc.couchbase.com/image: couchbase/server: 7.0 . 3                pvc.couchbase.com/spec: { "resources" :{ "limits" :{ "storage" : "10Gi" }, "requests" :{ "storage" : "5Gi" }}, "storageClassName" : "standard-rwo" }                server.couchbase.com/version: 7.0 . 3                serverConfig: kv                topology.kubernetes.io/zone: us-east1-b                volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io                volume.kubernetes.io/selected-node: gke-cluster- 1 - default -pool-45831c5b-p8n1 Kill all 3 CB pods:  Prateeks-MacBook-Pro:Downloads prateekkumar$ kubectl delete pods cb-example- 0006 cb-example- 0007 cb-example- 0008 pod "cb-example-0006" deleted pod "cb-example-0007" deleted pod "cb-example-0008" deleted  Pod cb-example-0006 comes up instantly in Warmup State while 0007 and 0008 are down:  { "level" : "info" , "ts" : 1648040462.1173003 , "logger" : "cluster" , "msg" : "Node status" , "cluster" : "default/cb-example" , "name" : "cb-example-0006" , "version" : "7.0.3" , "class" : "kv" , "managed" : true , "status" : "Warmup" } { "level" : "info" , "ts" : 1648040462.1173072 , "logger" : "cluster" , "msg" : "Node status" , "cluster" : "default/cb-example" , "name" : "cb-example-0007" , "version" : "7.0.3" , "class" : "kv" , "managed" : true , "status" : "Down" } { "level" : "info" , "ts" : 1648040462.1173117 , "logger" : "cluster" , "msg" : "Node status" , "cluster" : "default/cb-example" , "name" : "cb-example-0008" , "version" : "7.0.3" , "class" : "kv" , "managed" : true , "status" : "Down" } Autofailover time period is set as 2m by default, so we get log messages of Operator waiting for pod failover:  { "level" : "info" , "ts" : 1648040468.1909335 , "logger" : "cluster" , "msg" : "Reconciliation failed" , "cluster" : "default/cb-example" , "error" : "reconcile was blocked from running: waiting for pod failover" , "stack" : "github.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).handleDownNodes\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:423\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).exec\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:307\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcileMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:260\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:172\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:481\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:524\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/controller/controller.go:90\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.11.0/pkg/internal/controller/controller.go:227" } After 2m, all pods come up in the same server group with same annotations as before. Verified : Upgrade from 2.2.0 to 2.3.0 Upgrade from 6.6.2 to 7.0.3 post Operator upgrade. failure-domain.beta.kubernetes.io/zone annotation of PVC and Nodeselector of Pod updated to topology.kubernetes.io/zone after operator upgrade. All pods were rescheduled in the correct AZ after they were killed. All PVC acquire pvc.couchbase.com/image annotation after operator upgrade.

        People

          prateek.kumar Prateek Kumar (Inactive)
          arunkumar Arunkumar Senthilnathan (Inactive)
          Votes:
          0 Vote for this issue
          Watchers:
          3 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved:

            Gerrit Reviews

              There are no open Gerrit changes

              PagerDuty