Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2720

[7.1] Auto-Failover cannot be disabled with Multiple Root CAs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • 2.3.1
    • None
    • None
    • None
    • Couchbase Server Version 7.1.0-2549
    • 17: Automation, future fixes, 21: Auto., crackn' on Krakken, 23: ARM, Trackin' Kraken, 25: Maintain
    • 3

    Description

      • Create a 3 node cluster with Client and Server CAs: 

        serverTLS := e2eutil.MustInitClusterTLS(t, kubernetes, &e2eutil.TLSOpts{Source: e2eutil.TLSSourceKubernetesSecret})    
         
        clientTLS := e2eutil.MustInitClusterTLS(t, kubernetes, &e2eutil.TLSOpts{Source: e2eutil.TLSSourceKubernetesSecret}) 

      cluster.Spec.Networking.TLS = &couchbasev2.TLSPolicy{RootCAs: []string{serverTLS.CASecretName, clientTLS.CASecretName},} 

      • Enable N2N by setting encryption type as control-plane through jsonpatch. 

        patchset := jsonpatch.NewPatchSet().Add("/spec/networking/tls/nodeToNodeEncryption", encryptionType) 

      Observation:

      Node encryption is not set:

      • When trying to enable n2n, we first need to disable autofailover, this is taken care by node-controller API endpoint: enableExternalListener.
      • This step is not completed when Root CAs are present on the cluster. 
      • Hence, n2n is not set.

      ns_server.error.log 

      [ns_server:error,2022-04-19T11:58:49.140Z,ns_1@test-couchbase-rbgkr-0002.test-couchbase-rbgkr.test-mss2t.svc:ns_config<0.259.0>:ns_config:handle_info:874]Saving ns_config failed. Trying to ignore: {distribution_not_started,
                                                  [{auth,set_cookie,2,
                                                    [{file,"auth.erl"},{line,124}]},
                                                   {ns_server,get_babysitter_node,
                                                    0,
                                                    [{file,"src/ns_server.erl"},
                                                     {line,257}]},
                                                   {encryption_service,
                                                    maybe_clear_backup_key,1,
                                                    [{file,
                                                      "src/encryption_service.erl"},
                                                     {line,72}]},
                                                   {proc_lib,init_p_do_apply,3,
                                                    [{file,"proc_lib.erl"},
                                                     {line,226}]}]} 

      At around same timestamp, we see in .info.log that a request was made to disable AutoFailover: 

       [ns_server:info,2022-04-19T11:58:49.741Z,ns_1@test-couchbase-rbgkr-0002.test-couchbase-rbgkr.test-mss2t.svc:<0.913.0>:ns_cluster:apply_net_config:235]Applying net config. AFamily: inet, AFamilyOnly: false, NEncryption: false, DistProtos: [{inet,
                                                                                                false}]

      + memcached is unable to connect: 

      [ns_server:warn,2022-04-19T11:58:51.011Z,ns_1@test-couchbase-rbgkr-0002.test-couchbase-rbgkr.test-mss2t.svc:<0.594.0>:ns_memcached:connect:1240]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying.
      [ns_server:warn,2022-04-19T11:58:51.011Z,ns_1@test-couchbase-rbgkr-0002.test-couchbase-rbgkr.test-mss2t.svc:memcached_refresh<0.281.0>:ns_memcached:connect:1237]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}.
      [ns_server:warn,2022-04-19T11:58:51.011Z,ns_1@test-couchbase-rbgkr-0002.test-couchbase-rbgkr.test-mss2t.svc:<0.1005.0>:ns_memcached:connect:1240]Unable to connect: {error,{badmatch,[{inet,{error,econnrefused}}]}}, retrying.

      This prompts the ns_config to restore the saved config with Auto-failover enabled. 

      [ns_server:info,2022-04-19T11:58:51.235Z,ns_1@test-couchbase-rbgkr-0002.test-couchbase-rbgkr.test-mss2t.svc:ns_config<0.259.0>:ns_config:load_config:1108]Loading static config from "/opt/couchbase/etc/couchbase/config"
      [ns_server:info,2022-04-19T11:58:51.236Z,ns_1@test-couchbase-rbgkr-0002.test-couchbase-rbgkr.test-mss2t.svc:ns_config<0.259.0>:ns_config:load_config:1122]Loading dynamic config from "/opt/couchbase/var/lib/couchbase/config/config.dat"
      [ns_server:info,2022-04-19T11:58:51.246Z,ns_1@test-couchbase-rbgkr-0002.test-couchbase-rbgkr.test-mss2t.svc:ns_config<0.259.0>:ns_config:load_config:1152]Here's full dynamic config we loaded + static & default config:
      [{auto_failover_cfg,
        [{enabled,true},
         {timeout,120},
         {count,0}, 

      Hence, I think the request was made to disable auto-failover but it did not go through.

      Operator Logs and CB cluster logs attached. (Screenshot of certs on UI attached).

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            prateek.kumar Prateek Kumar (Inactive)
            prateek.kumar Prateek Kumar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty