Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-1900

Upgrade errors with Helm 2.1 when TLS enabled

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.1.0
    • helm, kubernetes
    • None
    • 18: PE/Tasks/Docs
    • 1

    Description

      The Helm Chart upgrade from 2.0.x to 2.1 encounters the following error:

      {"level":"error","ts":1610489073.7747614,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"couchbase-
      controller","request":"default/cb-test-couchbase-cluster","error":"secrets \"cb-test-couchbase-cluster\" already exists","stacktrac
      e":"github.com/go-logr/zapr.(*zapLogger)
      

      *Steps to Reproduce*

      1) helm install cb-test couchbase/couchbase-operator --version 2.0.2
      this will install the standard default

      2) then:

      kubectl replace -f crd.yaml
      kubectl create -f crd.yaml

      3) then upgrade the chart:

      helm upgrade cb-test couchbase/couchbase-operator

      We will hit the error above.

      In addition, If we have the following set to true in the values.yaml

        # TLS Certs that will be used to encrypt traffic between operator and couchbase
        tls:
          # enable to auto create certs
          generate: false
          # Expiry time of CA in days for generated certs
          expiration: 365
      

      Then upgrading to 2.1 operator from previous version will encounter the following error:

      {"level":"error","ts":1611102051.5212724,"logger":"cluster","msg":"Reconciliation failed","cluster":"default/demo","error":"certificate cannot be verified for zone: x509: certificate is valid for localhost, *.demo-couchbase-cluster.default.svc, *.demo-couchbase-cluster.default, *.demo-couchbase-cluster, *.demo-couchbase-cluster-srv.default.svc, *.demo-couchbase-cluster-srv.default, *.demo-couchbase-cluster-srv, demo-couchbase-cluster-srv.default.svc, demo-couchbase-cluster-srv.default, demo-couchbase-cluster-srv, *.demo-couchbase-cluster-srv.default.svc.cluster.local, host.demo-couchbase-cluster.default.svc.cluster.local, not host.demo
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Thanks Tin, this issue occurs because the 2.1 operator is now using a secret with same name as the cluster to store state.  The new 2.1 secret conflicts with the old auth secret which also had same name as the cluster.  As of 2.1 helm chart this conflict is gone, which means this will not be an issue for future releases.

            2.0.2 Upgrade options:

            First run

            helm repo update 

            1. If someone is currently installing 2.0.2 and planning a 2.1 upgrade, the issue can be avoided altogether by using custom secrets.

            Here are steps to install 2.0.2 when planning future 2.x upgrades:

            # create a custom secret
            kube create -f secret.yaml
             
            # use custom secret on 2.0.2 install
            helm install demo --set cluster.security.adminSecret=cb-example-auth  couchbase/couchbase-operator --version 2.0.2
             
            kubectl replace crd.yaml
            kubectl create crd.yaml
             
            # upgrade and re-use custom secret path
            helm upgrade demo --set cluster.security.adminSecret=cb-example-auth  couchbase/couchbase-operator

             

            2. If 2.0.2 cluster is already installed, the secret info needs to be extracted and applied to 2.1 secret.  (assumes name of secret is `demo-couchbase-cluster`)

            # Collect username/password from current secret name
            HELM_CB_USERNAME=`kubectl get secret demo-couchbase-cluster -o jsonpath="{.data.username}"`
            HELM_CB_PASSWORD=`kubectl get secret demo-couchbase-cluster -o jsonpath="{.data.password}"`
             
            kubectl replace crd.yaml 
            kubectl create crd.yaml
             
            # Upgrade with secret set to same name as old 2.0.2 name:
            helm upgrade demo --set cluster.security.adminSecret=demo-couchbase-cluster couchbase/couchbase-operator
             
            # Patch secret with username/password
            kubectl patch secret demo-couchbase-cluster -p '{"data":{"username": "'$HELM_CB_USERNAME'", "password": "'$HELM_CB_PASSWORD'"}}'
            
            

             

             

            tommie Tommie McAfee added a comment - Thanks Tin, this issue occurs because the 2.1 operator is now using a secret with same name as the cluster to store state.  The new 2.1 secret conflicts with the old auth secret which also had same name as the cluster.  As of 2.1 helm chart this conflict is gone, which means this will not be an issue for future releases. 2.0.2 Upgrade options: First run helm repo update 1. If someone is currently installing 2.0.2 and planning a 2.1 upgrade, the issue can be avoided altogether by using custom secrets. Here are steps to install 2.0.2 when planning future 2.x upgrades: # create a custom secret kube create -f secret.yaml   # use custom secret on 2.0 . 2 install helm install demo --set cluster.security.adminSecret=cb-example-auth couchbase/couchbase-operator --version 2.0 . 2   kubectl replace crd.yaml kubectl create crd.yaml   # upgrade and re-use custom secret path helm upgrade demo --set cluster.security.adminSecret=cb-example-auth couchbase/couchbase-operator   2. If 2.0.2 cluster is already installed, the secret info needs to be extracted and applied to 2.1 secret.  (assumes name of secret is `demo-couchbase-cluster`) # Collect username/password from current secret name HELM_CB_USERNAME=`kubectl get secret demo-couchbase-cluster -o jsonpath= "{.data.username}" ` HELM_CB_PASSWORD=`kubectl get secret demo-couchbase-cluster -o jsonpath= "{.data.password}" `   kubectl replace crd.yaml kubectl create crd.yaml   # Upgrade with secret set to same name as old 2.0 . 2 name: helm upgrade demo --set cluster.security.adminSecret=demo-couchbase-cluster couchbase/couchbase-operator   # Patch secret with username/password kubectl patch secret demo-couchbase-cluster -p '{"data":{"username": "' $HELM_CB_USERNAME '", "password": "' $HELM_CB_PASSWORD '"}}'    
            tin.tran Tin Tran (Inactive) added a comment - - edited

            Hi Tommie,

            It seems with this set to true

            # TLS Certs that will be used to encrypt traffic between operator and couchbase
             tls:
               # enable to auto create certs
               generate: true
               # Expiry time of CA in days for generated certs
               expiration: 365
            

            The upgrade will run into this error:

            {"level":"error","ts":1610589098.1309538,"logger":"cluster","msg":"Cluster status update failed","cluster":"fmm1/couchbase-couchbase-cluster","error":"admission webhook \"couchbase-operator-admission.fmm1.svc\" denied the request: validation failure list:\ncertificate cannot be verified for zone: x509: certificate is valid for localhost
            

            Is there anyway to get out of it? I've tried to re-create the operator tls with the ca.crt from the server but that still didn't work

            tin.tran Tin Tran (Inactive) added a comment - - edited Hi Tommie, It seems with this set to true # TLS Certs that will be used to encrypt traffic between operator and couchbase tls: # enable to auto create certs generate: true # Expiry time of CA in days for generated certs expiration: 365 The upgrade will run into this error: { "level" : "error" , "ts" : 1610589098.1309538 , "logger" : "cluster" , "msg" : "Cluster status update failed" , "cluster" : "fmm1/couchbase-couchbase-cluster" , "error" :"admission webhook \"couchbase-operator-admission.fmm1.svc\" denied the request: validation failure list:\ncertificate cannot be verified for zone: x509: certificate is valid for localhost Is there anyway to get out of it? I've tried to re-create the operator tls with the ca.crt from the server but that still didn't work
            tin.tran Tin Tran (Inactive) added a comment - - edited

            Hi Tommie McAfeeThank you for the workaround, I will make a note here with more details for record keeping:

            1) Install the Operator 2.0.2 without the cluster

            helm install cb-op couchbase/couchbase-operator --set install.couchbaseCluster=false --version 2.0.2
            

            2) Deploy Couchbase Cluster with TLS.generate set to true.

            helm install --values values.yaml demo couchbase/couchbase-operator --version 2.0.2
            

            3) we see the operator-tls and server-tls secrets

             
            kubect get secrets                                                                                                                                                          
            NAME                                               TYPE                                  DATA   AGE
             
            demo                                               Opaque                                4      7m13s
            demo-demo                                          Opaque                                2      11m
            demo-demo-operator-tls                             Opaque                                1      11m
            demo-demo-server-tls                               Opaque                                2      11m
            
            

            4) At this point, before we upgrade, we must re-generate the new certs with the correct format, we can do that by:

            helm template demo --values values.yaml  couchbase/couchbase-operator > secretsdemo.yaml 
            
            

            Please note that we must match the helm's release name (demo in this case) of the couchbase server cluster and use the same values.yaml.

            5) From the secretsdemo.yaml, remove everything except the operator and server tls secrets then replace the current secrets with:

            kubectl replace -f secretsdemo.yaml                                                                                                    
            
            

            We should now see the following lines in the Operator logs:

            {"level":"info","ts":1611184007.8059275,"logger":"cluster","msg":"Reloading certificate chain","cluster":"default/demo","name":"demo-0000"}
            {"level":"info","ts":1611184007.941515,"logger":"cluster","msg":"Reloading certificate chain","cluster":"default/demo","name":"demo-0001"}
            {"level":"info","ts":1611184008.0639791,"logger":"cluster","msg":"Reloading TLS client configuration"}
            

            tin.tran Tin Tran (Inactive) added a comment - - edited Hi Tommie McAfee Thank you for the workaround, I will make a note here with more details for record keeping: 1) Install the Operator 2.0.2 without the cluster helm install cb-op couchbase/couchbase-operator --set install.couchbaseCluster=false --version 2.0.2 2) Deploy Couchbase Cluster with TLS.generate set to true. helm install --values values.yaml demo couchbase/couchbase-operator --version 2.0.2 3) we see the operator-tls and server-tls secrets   kubect get secrets NAME TYPE DATA AGE   demo Opaque 4 7m13s demo-demo Opaque 2 11m demo-demo-operator-tls Opaque 1 11m demo-demo-server-tls Opaque 2 11m 4) At this point, before we upgrade, we must re-generate the new certs with the correct format, we can do that by: helm template demo --values values.yaml couchbase/couchbase-operator > secretsdemo.yaml Please note that we must match the helm's release name (demo in this case) of the couchbase server cluster and use the same values.yaml. 5) From the secretsdemo.yaml, remove everything except the operator and server tls secrets then replace the current secrets with: kubectl replace -f secretsdemo.yaml We should now see the following lines in the Operator logs: {"level":"info","ts":1611184007.8059275,"logger":"cluster","msg":"Reloading certificate chain","cluster":"default/demo","name":"demo-0000"} {"level":"info","ts":1611184007.941515,"logger":"cluster","msg":"Reloading certificate chain","cluster":"default/demo","name":"demo-0001"} {"level":"info","ts":1611184008.0639791,"logger":"cluster","msg":"Reloading TLS client configuration"}

            I think this should be resolved in the linked docs now - do we need to backport it as well? Tommie McAfee or Eric Schneider

            patrick.stephens Patrick Stephens (Inactive) added a comment - I think this should be resolved in the linked docs now - do we need to backport it as well? Tommie McAfee  or Eric Schneider

            Need to cherry pick this back to 2.1.

            ingenthr Matt Ingenthron added a comment - Need to cherry pick this back to 2.1.

            Cherry picked change over from master branch

            patrick.stephens Patrick Stephens (Inactive) added a comment - Cherry picked change over from master branch

            Patrick Stephens assigned K8S-1955 to handle QE review of the docs. I'll leave it up to you with what to do with this ticket. Perhaps you can close this one since the final review is being handled in the other ticket?

            eric.schneider Eric Schneider (Inactive) added a comment - Patrick Stephens assigned K8S-1955 to handle QE review of the docs. I'll leave it up to you with what to do with this ticket. Perhaps you can close this one since the final review is being handled in the other ticket?

            People

              patrick.stephens Patrick Stephens (Inactive)
              tin.tran Tin Tran (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty