Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2052

couchbase-operator panics when CouchbaseCluster exists

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 2.1.0
    • 2.2.0
    • operator
    • Kubernetes 1.19.7
    • 10: Autoscaling, completion
    • Critical
    • 1

    Description

      When starting up and assuming oversight of an existing CouchbaseCluster the couchbase-operator panics and enters a CrashLoopBackOff.

      Here is the couchbase-operator's output..

      {"level":"info","ts":1614874825.852879,"logger":"main","msg":"couchbase-operator","version":"2.1.0 (build 250)","revision":"b561a46fd687d668631596b3b2588ddca6457409"}\{"level":"info","ts":1614874825.852879,"logger":"main","msg":"couchbase-operator","version":"2.1.0 (build 250)","revision":"b561a46fd687d668631596b3b2588ddca6457409"}{"level":"info","ts":1614874827.3082435,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}I0304 16:20:27.308880       1 leaderelection.go:242] attempting to acquire leader lease  shared/couchbase-operator...\{"level":"info","ts":1614874827.3089359,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}I0304 16:20:44.715535       1 leaderelection.go:252] successfully acquired lease shared/couchbase-operator\{"level":"info","ts":1614874844.715822,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"couchbase-controller","source":"kind source: /, Kind="}{"level":"info","ts":1614874844.816188,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"couchbase-controller"}\{"level":"info","ts":1614874844.8162923,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"couchbase-controller","worker count":4}{"level":"info","ts":1614874861.8298838,"logger":"cluster","msg":"Watching new cluster","cluster":"shared/couchbase"}\{"level":"info","ts":1614874861.8299353,"logger":"cluster","msg":"Couchbase client starting","cluster":"shared/couchbase"}{"level":"info","ts":1614874861.8300164,"logger":"cluster","msg":"Janitor starting","cluster":"shared/couchbase"}\{"level":"info","ts":1614874861.8423789,"logger":"cluster","msg":"Upgrading resource","cluster":"shared/couchbase","kind":"pod","name":"couchbase-backup-incremental-1614495600-prl7t","version":"1.2.0"}E0304 16:21:01.842463       1 runtime.go:78] Observed a panic: "assignment to entry in nil map" (assignment to entry in nil map)goroutine 283 [running]:k8s.io/apimachinery/pkg/util/runtime.logPanic(0x14ca760, 0x1867110) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/runtime/runtime.go:74 +0xa3k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/runtime/runtime.go:48 +0x82panic(0x14ca760, 0x1867110) /usr/local/go/1.13.3/go/src/runtime/panic.go:679 +0x1b2github.com/couchbase/couchbase-operator/pkg/cluster.upgradePodFrom000000To010200(0xc000686540, 0xc0005d8000, 0x8, 0x8) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade_pod.go:103 +0x5agithub.com/couchbase/couchbase-operator/pkg/cluster.(*podUpgradableResource).perform(0xc00077fec0, 0x0, 0x0, 0xc0008f3400, 0x8) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade_pod.go:83 +0x68github.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).operatorUpgrade(0xc000686540, 0x0, 0x0) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade.go:155 +0x83fgithub.com/couchbase/couchbase-operator/pkg/cluster.New(0x7ffe89e4708c, 0x5, 0xc000245400, 0xc00068bec0, 0x2, 0x2) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:208 +0x55bgithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile(0xc0004db530, 0xc00001ca0a, 0x6, 0xc00001c9f4, 0x9, 0xc000379c00, 0x1, 0xc000379cc8, 0x46db68) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/controller/controller.go:74 +0xbc8sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00035e000, 0x151b340, 0xc000583280, 0x0) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:256 +0x162sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00035e000, 0x0) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:232 +0xcbsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc00035e000) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:211 +0x2bk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000683f60) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:152 +0x5ek8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000683f60, 0x3b9aca00, 0x0, 0x1, 0xc00009a180) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:153 +0xf8k8s.io/apimachinery/pkg/util/wait.Until(0xc000683f60, 0x3b9aca00, 0xc00009a180) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:88 +0x4dcreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1 /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:193 +0x328panic: assignment to entry in nil map [recovered] panic: assignment to entry in nil map
      goroutine 283 [running]:k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/runtime/runtime.go:55 +0x105panic(0x14ca760, 0x1867110) /usr/local/go/1.13.3/go/src/runtime/panic.go:679 +0x1b2github.com/couchbase/couchbase-operator/pkg/cluster.upgradePodFrom000000To010200(0xc000686540, 0xc0005d8000, 0x8, 0x8) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade_pod.go:103 +0x5agithub.com/couchbase/couchbase-operator/pkg/cluster.(*podUpgradableResource).perform(0xc00077fec0, 0x0, 0x0, 0xc0008f3400, 0x8) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade_pod.go:83 +0x68github.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).operatorUpgrade(0xc000686540, 0x0, 0x0) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/operator_upgrade.go:155 +0x83fgithub.com/couchbase/couchbase-operator/pkg/cluster.New(0x7ffe89e4708c, 0x5, 0xc000245400, 0xc00068bec0, 0x2, 0x2) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/cluster/cluster.go:208 +0x55bgithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile(0xc0004db530, 0xc00001ca0a, 0x6, 0xc00001c9f4, 0x9, 0xc000379c00, 0x1, 0xc000379cc8, 0x46db68) /home/couchbase/jenkins/workspace/couchbase-k8s-microservice-build/couchbase-operator/pkg/controller/controller.go:74 +0xbc8sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00035e000, 0x151b340, 0xc000583280, 0x0) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:256 +0x162sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00035e000, 0x0) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:232 +0xcbsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker(0xc00035e000) /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:211 +0x2bk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc000683f60) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:152 +0x5ek8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000683f60, 0x3b9aca00, 0x0, 0x1, 0xc00009a180) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:153 +0xf8k8s.io/apimachinery/pkg/util/wait.Until(0xc000683f60, 0x3b9aca00, 0xc00009a180) /home/couchbase/go/pkg/mod/k8s.io/apimachinery@v0.17.5-beta.0/pkg/util/wait/wait.go:88 +0x4dcreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1 /home/couchbase/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.5.0/pkg/internal/controller/controller.go:193 +0x328
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          simon.murray Simon Murray added a comment -

          We do extensive testing of operator restarts as it's expected due to kubernetes and operator upgrades, so this is quote surprising.  Can you collect logs please https://docs.couchbase.com/operator/current/reference-cbopinfo.html?

          simon.murray Simon Murray added a comment - We do extensive testing of operator restarts as it's expected due to kubernetes and operator upgrades, so this is quote surprising.  Can you collect logs please https://docs.couchbase.com/operator/current/reference-cbopinfo.html?
          simon.murray Simon Murray added a comment -

          Actually, ignore that request, I can see it's trying to modify "couchbase-backup-incremental-1614495600-prl7t" e.g. a backup process, and it should have been filtered out.

          simon.murray Simon Murray added a comment - Actually, ignore that request, I can see it's trying to modify " couchbase-backup-incremental-1614495600-prl7t" e.g. a backup process, and it should have been filtered out.
          simon.murray Simon Murray added a comment -

          Note, once that backup job terminates it should come back to life.

          simon.murray Simon Murray added a comment - Note, once that backup job terminates it should come back to life.
          jonasrmichel Jonas Michel added a comment - - edited

          Great info Simon Murray!

          It seems that after removing all Completed backup Jobs (and their Pods) the operator runs without error.

          However, it seems this is the real bug: the operator panics if there are any backup Pods.

          For now, I've configured my backups' successfulJobsHistoryLimit to 0.

          jonasrmichel Jonas Michel added a comment - - edited Great info Simon Murray ! It seems that after removing all Completed backup Jobs (and their Pods) the operator runs without error. However, it seems this is the real bug: the operator panics if there are any backup Pods. For now, I've configured my backups' successfulJobsHistoryLimit to 0.
          simon.murray Simon Murray added a comment -

          releasenote:

          A race condition existed where the Operator was restarted and a backup `Pod` was still executing. This `Pod` was erroneously considered as part of a metadata update routine and caused a crash loop until the backup terminated. This has now been fixed by correctly filtering the pods considered for metadata updates to only include Couchbase Server instances.

          simon.murray Simon Murray added a comment - releasenote: A race condition existed where the Operator was restarted and a backup `Pod` was still executing. This `Pod` was erroneously considered as part of a metadata update routine and caused a crash loop until the backup terminated. This has now been fixed by correctly filtering the pods considered for metadata updates to only include Couchbase Server instances.

          People

            simon.murray Simon Murray
            jonasrmichel Jonas Michel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty