Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
1
Description
Summary
When using both server groups and manually specified node selectors for your pods (as recommended in our docs), the Operator will incorrectly detect a diff every reconcile loop.
This will mostly result in huge log spam, functionally the Operator still works correctly but makes debugging/supportability almost impossible as the logs wrap very quickly.
+Steps to Reproduce
- Create the following cluster (adjusting the serverGroup for what's appropriate for your cluster):
apiVersion: couchbase.com/v2
kind: CouchbaseCluster
metadata:
name: cb-example
spec:
serverGroups:
- eu-west-2b
image: couchbase/server:6.5.0
security:
adminSecret: cb-example-auth
buckets:
managed: true
servers:
- size: 1
name: data
pod:
spec:
nodeSelector:
kubernetes.io/os: linux
services:
- data
- Wait for the cluster to be setup
- Review the Operator logs
Expected behavior
Once the cluster has been setup, there shouldn't be any further logging from the Operator as it can bail from the reconcile loop without taking action
Actual behavior
Logs are spammed once a second with messages like:
{"level":"info","ts":1595325880.433218,"logger":"cluster","msg":"Resource updated","cluster":"default/cb-example","diff":" strings.Join({\n \t... // 38 identical lines\n \t\" containers: null\",\n \t\" nodeSelector:\",\n- \t\" failure-domain.beta.kubernetes.io/zone: eu-west-2b\",\n \t\" kubernetes.io/os: linux\",\n \t\" resources: {}\",\n \t... // 6 identical lines\n }, \"\\n\")\n"}
|
{"level":"info","ts":1595325881.8336375,"logger":"cluster","msg":"Resource updated","cluster":"default/cb-example","diff":" strings.Join({\n \t... // 38 identical lines\n \t\" containers: null\",\n \t\" nodeSelector:\",\n- \t\" failure-domain.beta.kubernetes.io/zone: eu-west-2b\",\n \t\" kubernetes.io/os: linux\",\n \t\" resources: {}\",\n \t... // 6 identical lines\n }, \"\\n\")\n"}
|
Analysis
It seems that the Operator is not taking into account the merging of the Node Selector that it has to do to accomodate server groups when it is generating the diff between the contents of the CouchbaseCluster and the previous one.
This leads it to believe that it's changing every single reconcile loop.
The actual pod has the correct node selectors applied, so it's just an issue with the diffing:
nodeSelector:
|
failure-domain.beta.kubernetes.io/zone: eu-west-2b
|
kubernetes.io/os: linux
|
Attachments
Issue Links
- blocks
-
K8S-1506 Autonomous Operator (Kubernetes) 2.0.2 GA Release - target on web week of July 27
- Resolved