Details
-
Bug
-
Resolution: Fixed
-
Critical
-
3.2.5
-
Couchbase Server 6.5.1 Enterprise
Couchbase Autonomous Operator 2.0.2 using PersistentVolumeClaims
Couchbase .NET SDK 3.2.5
.NET 6.0 on Linux (also within Kubernetes)
-
1
Description
When performing a Kubernetes rolling upgrade, Kubernetes nodes are drained using `kubectl drain`. This causes the pods on that node to be deleted, and they are then automatically failed over.
After the failover is complete, the Autonomous Operator creates a new Kubernetes Pod with the same name, mounts the previous volumes, and adds it back to the cluster. The node is identified within Couchbase Server using a DNS name derived from the Pod name. For example, a pod named "couchbase-primary-0062" uses the DNS name "couchbase-primary-0062.couchbase-primary.default.svc".
However, because a new Pod was made, the Pod will receive a different IP address than the node had originally. Internally within the SDK, name resolution to an IP happens the first time the node is added to the cluster. After that, the IP is being cached without regard to DNS TTL. This means that the SDK consistently fails to reconnect to the node, and the application must be recycled.
Logs from Couchbase Autonomous Operator during a Kubernetes upgrade:
{"level":"info","ts":1642506529.9468436,"logger":"cluster","msg":"Pod down, waiting for auto-failover","cluster":"default/couchbase-primary","name":"couchbase-primary-0062","recovery_in":29.835662744}
|
{"level":"error","ts":1642506529.9468813,"logger":"cluster","msg":"Reconciliation failed","cluster":"default/couchbase-primary","error":"waiting for pod failover","stacktrace":"github.com/couchbase/couchbase-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:370\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:387\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/controller/controller.go:86\ngithub.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
|
{"level":"info","ts":1642506530.5100014,"logger":"cluster","msg":"External address collection failed","cluster":"default/couchbase-primary","name":"couchbase-primary-0062"}
|
{"level":"info","ts":1642506530.932535,"logger":"couchbaseutil","msg":"Cluster status","cluster":"default/couchbase-primary","balance":"unbalanced","rebalancing":false}
|
{"level":"info","ts":1642506530.932574,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0059","version":"enterprise-6.5.1","class":"query-1a-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.9325805,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0060","version":"enterprise-6.5.1","class":"query-1b-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.932585,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0061","version":"enterprise-6.5.1","class":"index-1a-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.932589,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0062","version":"enterprise-6.5.1","class":"data-1c-isolated","managed":true,"status":"failed"}
|
{"level":"info","ts":1642506530.932593,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0063","version":"enterprise-6.5.1","class":"data-1b-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.9325972,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0064","version":"enterprise-6.5.1","class":"data-1a-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.932601,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0065","version":"enterprise-6.5.1","class":"index-1b-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.932605,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0066","version":"enterprise-6.5.1","class":"index-1c-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.9326172,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0064","class":"data-1a-isolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506530.932626,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0063","class":"data-1b-isolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506530.9326315,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0061","class":"index-1a-issolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506530.9326372,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0065","class":"index-1b-isolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506530.932642,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0066","class":"index-1c-isolated","group":"us-east-1c"}
|
{"level":"info","ts":1642506530.9326475,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0059","class":"query-1a-issolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506530.9326563,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0060","class":"query-1b-issolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506532.6439211,"logger":"cluster","msg":"Pods failed over","cluster":"default/couchbase-primary"}
|
{"level":"info","ts":1642506532.6530786,"logger":"cluster","msg":"Creating pod","cluster":"default/couchbase-primary","name":"couchbase-primary-0062","image":"couchbase/server:enterprise-6.5.1"}
|
{"level":"error","ts":1642506569.7394078,"logger":"cluster","msg":"Reconciliation failed","cluster":"default/couchbase-primary","error":"recovering node http://couchbase-primary-0062.couchbase-primary.default.svc:8091","stacktrace":"github.com/couchbase/couchbase-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:370\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:387\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/controller/controller.go:86\ngithub.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
|
{"level":"info","ts":1642506570.0897486,"logger":"couchbaseutil","msg":"Cluster status","cluster":"default/couchbase-primary","balance":"unbalanced","rebalancing":false}
|
{"level":"info","ts":1642506570.0897832,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0059","version":"enterprise-6.5.1","class":"query-1a-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.08979,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0060","version":"enterprise-6.5.1","class":"query-1b-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.0897958,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0061","version":"enterprise-6.5.1","class":"index-1a-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.089801,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0062","version":"enterprise-6.5.1","class":"data-1c-isolated","managed":true,"status":"add_back"}
|
{"level":"info","ts":1642506570.0898066,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0063","version":"enterprise-6.5.1","class":"data-1b-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.089811,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0064","version":"enterprise-6.5.1","class":"data-1a-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.0898151,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0065","version":"enterprise-6.5.1","class":"index-1b-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.0898192,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0066","version":"enterprise-6.5.1","class":"index-1c-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.0898309,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0064","class":"data-1a-isolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506570.0898392,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0063","class":"data-1b-isolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506570.0898435,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0062","class":"data-1c-isolated","group":"us-east-1c"}
|
{"level":"info","ts":1642506570.0898473,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0061","class":"index-1a-issolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506570.0898511,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0065","class":"index-1b-isolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506570.0898552,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0066","class":"index-1c-isolated","group":"us-east-1c"}
|
{"level":"info","ts":1642506570.089859,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0059","class":"query-1a-issolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506570.089863,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0060","class":"query-1b-issolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506570.3508325,"logger":"cluster","msg":"Marking pod for delta recovery","cluster":"default/couchbase-primary","name":"couchbase-primary-0062"}
|
{"level":"info","ts":1642506573.9273438,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506577.9309807,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506581.934769,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506585.9384718,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506589.9420898,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506593.9458678,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506597.9495485,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506601.9534369,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506605.9576228,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506609.9611921,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506613.9646652,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506617.9748719,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506621.978477,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506625.9887943,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0.3045808966861605}
|
{"level":"info","ts":1642506630.0065265,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":2.710769980506822}
|
{"level":"info","ts":1642506634.0232415,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":5.025584795321638}
|
{"level":"info","ts":1642506638.02851,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":7.27948343079922}
|
{"level":"info","ts":1642506642.0334635,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":9.746588693957115}
|
{"level":"info","ts":1642506646.0494561,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":11.66544834307992}
|
{"level":"info","ts":1642506650.071835,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":14.31530214424951}
|
{"level":"info","ts":1642506654.1107063,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":16.32553606237817}
|
{"level":"info","ts":1642506658.1285944,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":18.82309941520468}
|
{"level":"info","ts":1642506662.136291,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":20.83333333333333}
|
{"level":"info","ts":1642506666.1712596,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":21.2602552169363}
|
{"level":"info","ts":1642506670.1908245,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":22.23682144224081}
|
{"level":"info","ts":1642506674.2057269,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":23.0440263838185}
|
{"level":"info","ts":1642506678.21471,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":23.67010006624883}
|
{"level":"info","ts":1642506682.2339375,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":24.86409294418187}
|
{"level":"info","ts":1642506686.2450745,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":25.90760338822703}
|
{"level":"info","ts":1642506690.2578008,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":26.77845441228642}
|
{"level":"info","ts":1642506694.2666256,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":28.47133212279812}
|
{"level":"info","ts":1642506698.28025,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":30.43778427550358}
|
{"level":"info","ts":1642506702.284427,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":31.25}
|
{"level":"info","ts":1642506706.3642836,"logger":"cluster","msg":"Rebalance completed successfully","cluster":"default/couchbase-primary"}
|
{"level":"info","ts":1642506706.5230074,"logger":"cluster","msg":"Reconcile completed","cluster":"default/couchbase-primary"}
|
Example logs from an application are also attached, filtered to the Coubhase SourceContext. Individual operations are failing with Ambiguous and UnambiguousTimeoutException.