Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-1584

Using server groups and node selector causes Operator to incorrectly detect a diff

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 2.0.2
    • None
    • operator
    • 1

    Description

      Summary
      When using both server groups and manually specified node selectors for your pods (as recommended in our docs), the Operator will incorrectly detect a diff every reconcile loop.
      This will mostly result in huge log spam, functionally the Operator still works correctly but makes debugging/supportability almost impossible as the logs wrap very quickly.

      +Steps to Reproduce

      1. Create the following cluster (adjusting the serverGroup for what's appropriate for your cluster):

        apiVersion: couchbase.com/v2
        kind: CouchbaseCluster
        metadata:
          name: cb-example
        spec:
          serverGroups:
            - eu-west-2b
          image: couchbase/server:6.5.0
          security:
            adminSecret: cb-example-auth
          buckets:
            managed: true
          servers:
          - size: 1
            name: data
            pod:
              spec:
                nodeSelector:
                    kubernetes.io/os: linux
            services:
            - data
        

      2. Wait for the cluster to be setup
      3. Review the Operator logs

      Expected behavior

      Once the cluster has been setup, there shouldn't be any further logging from the Operator as it can bail from the reconcile loop without taking action

      Actual behavior
      Logs are spammed once a second with messages like:

      {"level":"info","ts":1595325880.433218,"logger":"cluster","msg":"Resource updated","cluster":"default/cb-example","diff":"  strings.Join({\n  \t... // 38 identical lines\n  \t\"      containers: null\",\n  \t\"      nodeSelector:\",\n- \t\"        failure-domain.beta.kubernetes.io/zone: eu-west-2b\",\n  \t\"        kubernetes.io/os: linux\",\n  \t\"  resources: {}\",\n  \t... // 6 identical lines\n  }, \"\\n\")\n"}
      {"level":"info","ts":1595325881.8336375,"logger":"cluster","msg":"Resource updated","cluster":"default/cb-example","diff":"  strings.Join({\n  \t... // 38 identical lines\n  \t\"      containers: null\",\n  \t\"      nodeSelector:\",\n- \t\"        failure-domain.beta.kubernetes.io/zone: eu-west-2b\",\n  \t\"        kubernetes.io/os: linux\",\n  \t\"  resources: {}\",\n  \t... // 6 identical lines\n  }, \"\\n\")\n"}
      

      Analysis

      It seems that the Operator is not taking into account the merging of the Node Selector that it has to do to accomodate server groups when it is generating the diff between the contents of the CouchbaseCluster and the previous one.
      This leads it to believe that it's changing every single reconcile loop.
      The actual pod has the correct node selectors applied, so it's just an issue with the diffing:

        nodeSelector:
          failure-domain.beta.kubernetes.io/zone: eu-west-2b
          kubernetes.io/os: linux
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              simon.murray Simon Murray
              matt.carabine Matt Carabine (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty