Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2052

couchbase-operator panics when CouchbaseCluster exists

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.2.0
    • Component/s: operator
    • Labels:
    • Environment:
      Kubernetes 1.19.7
    • Sprint:
      10: Autoscaling, completion
    • CVSS/Severity:
      Critical
    • Story Points:
      1

      Description

      When starting up and assuming oversight of an existing CouchbaseCluster the couchbase-operator panics and enters a CrashLoopBackOff.

      Here is the couchbase-operator's output..

      {"level":"info","ts":1614874825.852879,"logger":"main","msg":"couchbase-operator","version":"2.1.0 (build 250)","revision":"b561a46fd687d668631596b3b2588ddca6457409"}\{"level":"info","ts":1614874825.852879,"logger":"main","msg":"couchbase-operator","version":"2.1.0 (build 250)","revision":"b561a46fd687d668631596b3b2588ddca6457409"}{"level":"info","ts":1614874827.3082435,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}I0304 16:20:27.308880       1 leaderelection.go:242] attempting to acquire leader lease  shared/couchbase-operator...\{"level":"info","ts":1614874827.3089359,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}I0304 16:20:44.715535       1 leaderelection.go:252] successfully acquired lease shared/couchbase-operator\{"level":"info","ts":1614874844.715822,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"couchbase-controller","source":"kind source: /, Kind="}{"level":"info","ts":1614874844.816188,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"couchbase-controller"}\{"level":"info","ts":1614874844.8162923,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"couchbase-controller","worker count":4}{"level":"info","ts":1614874861.8298838,"logger":"cluster","msg":"Watching new cluster","cluster":"shared/couchbase"}\{"level":"info","ts":1614874861.8299353,"logger":"cluster","msg":"Couchbase client starting","cluster":"shared/couchbase"}{"level":"info","ts":1614874861.8300164,"logger":"cluster","msg":"Janitor starting","cluster":"shared/couchbase"}\{"level":"info","ts":1614874861.8423789,"logger":"cluster","msg":"Upgrading resource","cluster":"shared/couchbase","kind":"pod","name":"couchbase-backup-incremental-1614495600-prl7t","version":"1.2.0"}E0304 16:21:01.842463       1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map)goroutine 283 [running]:k8s.io/apimachinery/pkg/util/runtime.logPanic(0x14ca760, 0x1867110) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/runtime/runtime.go:74 +0xa3k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/runtime/runtime.go:48 +0x82panic(0x14ca760, 0x1867110) /usr/local/go/1.13.3/go/src/runtime/panic.go:679 +0x1b2github.com/couchbase/couchbase-operator/pkg/cluster.upgradePodFrom000000To010200(0xc000686540, 0xc0005d8000, 0x8, 0x8) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade_pod.go:103 +0x5agithub.com/couchbase/couchbase-operator/pkg/cluster.(*podUpgradableResource).perform(0xc00077fec0, 0x0, 0x0, 0xc0008f3400, 0x8) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade_pod.go:83 +0x68github.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).operatorUpgrade(0xc000686540, 0x0, 0x0) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade.go:155 +0x83fgithub.com/couchbase/couchbase-operator/pkg/cluster.New(0x7ffe89e4708c, 0x5, 0xc000245400, 0xc00068bec0, 0x2, 0x2) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:208 +0x55bgithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile(0xc0004db530, 0xc00001ca0a, 0x6, 0xc00001c9f4, 0x9, 0xc000379c00, 0x1, 0xc000379cc8, 0x46db68) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/controller/controller.go:74 +0xbc8sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00035e000, 0x151b340, 0xc000583280, 0x0) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:256 +0x162sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00035e000, 0x0) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:232 +0xcbsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc00035e000) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:211 +0x2bk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000683f60) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:152 +0x5ek8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000683f60, 0x3b9aca00, 0x0, 0x1, 0xc00009a180) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:153 +0xf8k8s.io/apimachinery/pkg/util/wait.Until(0xc000683f60, 0x3b9aca00, 0xc00009a180) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:88 +0x4dcreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1 /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:193 +0x328panic: assignment to entry in nil map [recovered] panic: assignment to entry in nil map
      goroutine 283 [running]:k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/runtime/runtime.go:55 +0x105panic(0x14ca760, 0x1867110) /usr/local/go/1.13.3/go/src/runtime/panic.go:679 +0x1b2github.com/couchbase/couchbase-operator/pkg/cluster.upgradePodFrom000000To010200(0xc000686540, 0xc0005d8000, 0x8, 0x8) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade_pod.go:103 +0x5agithub.com/couchbase/couchbase-operator/pkg/cluster.(*podUpgradableResource).perform(0xc00077fec0, 0x0, 0x0, 0xc0008f3400, 0x8) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade_pod.go:83 +0x68github.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).operatorUpgrade(0xc000686540, 0x0, 0x0) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade.go:155 +0x83fgithub.com/couchbase/couchbase-operator/pkg/cluster.New(0x7ffe89e4708c, 0x5, 0xc000245400, 0xc00068bec0, 0x2, 0x2) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:208 +0x55bgithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile(0xc0004db530, 0xc00001ca0a, 0x6, 0xc00001c9f4, 0x9, 0xc000379c00, 0x1, 0xc000379cc8, 0x46db68) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/controller/controller.go:74 +0xbc8sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00035e000, 0x151b340, 0xc000583280, 0x0) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:256 +0x162sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00035e000, 0x0) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:232 +0xcbsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc00035e000) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:211 +0x2bk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000683f60) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:152 +0x5ek8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000683f60, 0x3b9aca00, 0x0, 0x1, 0xc00009a180) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:153 +0xf8k8s.io/apimachinery/pkg/util/wait.Until(0xc000683f60, 0x3b9aca00, 0xc00009a180) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:88 +0x4dcreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1 /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:193 +0x328
      

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          simon.murray Simon Murray added a comment -

          We do extensive testing of operator restarts as it's expected due to kubernetes and operator upgrades, so this is quote surprising.  Can you collect logs please https://docs.couchbase.com/operator/current/reference-cbopinfo.html?

          Show
          simon.murray Simon Murray added a comment - We do extensive testing of operator restarts as it's expected due to kubernetes and operator upgrades, so this is quote surprising.  Can you collect logs please https://docs.couchbase.com/operator/current/reference-cbopinfo.html?
          Hide
          simon.murray Simon Murray added a comment -

          Actually, ignore that request, I can see it's trying to modify "couchbase-backup-incremental-1614495600-prl7t" e.g. a backup process, and it should have been filtered out.

          Show
          simon.murray Simon Murray added a comment - Actually, ignore that request, I can see it's trying to modify " couchbase-backup-incremental-1614495600-prl7t" e.g. a backup process, and it should have been filtered out.
          Hide
          simon.murray Simon Murray added a comment -

          Note, once that backup job terminates it should come back to life.

          Show
          simon.murray Simon Murray added a comment - Note, once that backup job terminates it should come back to life.
          Hide
          jonasrmichel Jonas Michel added a comment - - edited

          Great info Simon Murray!

          It seems that after removing all Completed backup Jobs (and their Pods) the operator runs without error.

          However, it seems this is the real bug: the operator panics if there are any backup Pods.

          For now, I've configured my backups' successfulJobsHistoryLimit to 0.

          Show
          jonasrmichel Jonas Michel added a comment - - edited Great info Simon Murray ! It seems that after removing all Completed backup Jobs (and their Pods) the operator runs without error. However, it seems this is the real bug: the operator panics if there are any backup Pods. For now, I've configured my backups' successfulJobsHistoryLimit to 0.
          Hide
          simon.murray Simon Murray added a comment -

          releasenote:

          A race condition existed where the Operator was restarted and a backup `Pod` was still executing. This `Pod` was erroneously considered as part of a metadata update routine and caused a crash loop until the backup terminated. This has now been fixed by correctly filtering the pods considered for metadata updates to only include Couchbase Server instances.

          Show
          simon.murray Simon Murray added a comment - releasenote: A race condition existed where the Operator was restarted and a backup `Pod` was still executing. This `Pod` was erroneously considered as part of a metadata update routine and caused a crash loop until the backup terminated. This has now been fixed by correctly filtering the pods considered for metadata updates to only include Couchbase Server instances.

            People

            Assignee:
            simon.murray Simon Murray
            Reporter:
            jonasrmichel Jonas Michel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty