Details
-
Bug
-
Resolution: Fixed
-
Critical
-
3.2.5
-
Couchbase Server 6.5.1 Enterprise
Couchbase Autonomous Operator 2.0.2 using PersistentVolumeClaims
Couchbase .NET SDK 3.2.5
.NET 6.0 on Linux (also within Kubernetes)
-
1
Description
When performing a Kubernetes rolling upgrade, Kubernetes nodes are drained using `kubectl drain`. This causes the pods on that node to be deleted, and they are then automatically failed over.
After the failover is complete, the Autonomous Operator creates a new Kubernetes Pod with the same name, mounts the previous volumes, and adds it back to the cluster. The node is identified within Couchbase Server using a DNS name derived from the Pod name. For example, a pod named "couchbase-primary-0062" uses the DNS name "couchbase-primary-0062.couchbase-primary.default.svc".
However, because a new Pod was made, the Pod will receive a different IP address than the node had originally. Internally within the SDK, name resolution to an IP happens the first time the node is added to the cluster. After that, the IP is being cached without regard to DNS TTL. This means that the SDK consistently fails to reconnect to the node, and the application must be recycled.
Logs from Couchbase Autonomous Operator during a Kubernetes upgrade:
{"level":"info","ts":1642506529.9468436,"logger":"cluster","msg":"Pod down, waiting for auto-failover","cluster":"default/couchbase-primary","name":"couchbase-primary-0062","recovery_in":29.835662744}
|
{"level":"error","ts":1642506529.9468813,"logger":"cluster","msg":"Reconciliation failed","cluster":"default/couchbase-primary","error":"waiting for pod failover","stacktrace":"github.com/couchbase/couchbase-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:370\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:387\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/controller/controller.go:86\ngithub.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
|
{"level":"info","ts":1642506530.5100014,"logger":"cluster","msg":"External address collection failed","cluster":"default/couchbase-primary","name":"couchbase-primary-0062"}
|
{"level":"info","ts":1642506530.932535,"logger":"couchbaseutil","msg":"Cluster status","cluster":"default/couchbase-primary","balance":"unbalanced","rebalancing":false}
|
{"level":"info","ts":1642506530.932574,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0059","version":"enterprise-6.5.1","class":"query-1a-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.9325805,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0060","version":"enterprise-6.5.1","class":"query-1b-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.932585,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0061","version":"enterprise-6.5.1","class":"index-1a-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.932589,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0062","version":"enterprise-6.5.1","class":"data-1c-isolated","managed":true,"status":"failed"}
|
{"level":"info","ts":1642506530.932593,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0063","version":"enterprise-6.5.1","class":"data-1b-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.9325972,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0064","version":"enterprise-6.5.1","class":"data-1a-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.932601,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0065","version":"enterprise-6.5.1","class":"index-1b-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.932605,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0066","version":"enterprise-6.5.1","class":"index-1c-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506530.9326172,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0064","class":"data-1a-isolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506530.932626,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0063","class":"data-1b-isolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506530.9326315,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0061","class":"index-1a-issolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506530.9326372,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0065","class":"index-1b-isolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506530.932642,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0066","class":"index-1c-isolated","group":"us-east-1c"}
|
{"level":"info","ts":1642506530.9326475,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0059","class":"query-1a-issolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506530.9326563,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0060","class":"query-1b-issolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506532.6439211,"logger":"cluster","msg":"Pods failed over","cluster":"default/couchbase-primary"}
|
{"level":"info","ts":1642506532.6530786,"logger":"cluster","msg":"Creating pod","cluster":"default/couchbase-primary","name":"couchbase-primary-0062","image":"couchbase/server:enterprise-6.5.1"}
|
{"level":"error","ts":1642506569.7394078,"logger":"cluster","msg":"Reconciliation failed","cluster":"default/couchbase-primary","error":"recovering node http://couchbase-primary-0062.couchbase-primary.default.svc:8091","stacktrace":"github.com/couchbase/couchbase-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:370\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:387\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/controller/controller.go:86\ngithub.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
|
{"level":"info","ts":1642506570.0897486,"logger":"couchbaseutil","msg":"Cluster status","cluster":"default/couchbase-primary","balance":"unbalanced","rebalancing":false}
|
{"level":"info","ts":1642506570.0897832,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0059","version":"enterprise-6.5.1","class":"query-1a-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.08979,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0060","version":"enterprise-6.5.1","class":"query-1b-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.0897958,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0061","version":"enterprise-6.5.1","class":"index-1a-issolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.089801,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0062","version":"enterprise-6.5.1","class":"data-1c-isolated","managed":true,"status":"add_back"}
|
{"level":"info","ts":1642506570.0898066,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0063","version":"enterprise-6.5.1","class":"data-1b-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.089811,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0064","version":"enterprise-6.5.1","class":"data-1a-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.0898151,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0065","version":"enterprise-6.5.1","class":"index-1b-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.0898192,"logger":"couchbaseutil","msg":"Node status","cluster":"default/couchbase-primary","name":"couchbase-primary-0066","version":"enterprise-6.5.1","class":"index-1c-isolated","managed":true,"status":"active"}
|
{"level":"info","ts":1642506570.0898309,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0064","class":"data-1a-isolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506570.0898392,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0063","class":"data-1b-isolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506570.0898435,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0062","class":"data-1c-isolated","group":"us-east-1c"}
|
{"level":"info","ts":1642506570.0898473,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0061","class":"index-1a-issolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506570.0898511,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0065","class":"index-1b-isolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506570.0898552,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0066","class":"index-1c-isolated","group":"us-east-1c"}
|
{"level":"info","ts":1642506570.089859,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0059","class":"query-1a-issolated","group":"us-east-1a"}
|
{"level":"info","ts":1642506570.089863,"logger":"scheduler","msg":"Scheduler status","cluster":"default/couchbase-primary","name":"couchbase-primary-0060","class":"query-1b-issolated","group":"us-east-1b"}
|
{"level":"info","ts":1642506570.3508325,"logger":"cluster","msg":"Marking pod for delta recovery","cluster":"default/couchbase-primary","name":"couchbase-primary-0062"}
|
{"level":"info","ts":1642506573.9273438,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506577.9309807,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506581.934769,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506585.9384718,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506589.9420898,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506593.9458678,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506597.9495485,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506601.9534369,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506605.9576228,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506609.9611921,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506613.9646652,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506617.9748719,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506621.978477,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0}
|
{"level":"info","ts":1642506625.9887943,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":0.3045808966861605}
|
{"level":"info","ts":1642506630.0065265,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":2.710769980506822}
|
{"level":"info","ts":1642506634.0232415,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":5.025584795321638}
|
{"level":"info","ts":1642506638.02851,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":7.27948343079922}
|
{"level":"info","ts":1642506642.0334635,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":9.746588693957115}
|
{"level":"info","ts":1642506646.0494561,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":11.66544834307992}
|
{"level":"info","ts":1642506650.071835,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":14.31530214424951}
|
{"level":"info","ts":1642506654.1107063,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":16.32553606237817}
|
{"level":"info","ts":1642506658.1285944,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":18.82309941520468}
|
{"level":"info","ts":1642506662.136291,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":20.83333333333333}
|
{"level":"info","ts":1642506666.1712596,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":21.2602552169363}
|
{"level":"info","ts":1642506670.1908245,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":22.23682144224081}
|
{"level":"info","ts":1642506674.2057269,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":23.0440263838185}
|
{"level":"info","ts":1642506678.21471,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":23.67010006624883}
|
{"level":"info","ts":1642506682.2339375,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":24.86409294418187}
|
{"level":"info","ts":1642506686.2450745,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":25.90760338822703}
|
{"level":"info","ts":1642506690.2578008,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":26.77845441228642}
|
{"level":"info","ts":1642506694.2666256,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":28.47133212279812}
|
{"level":"info","ts":1642506698.28025,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":30.43778427550358}
|
{"level":"info","ts":1642506702.284427,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/couchbase-primary","progress":31.25}
|
{"level":"info","ts":1642506706.3642836,"logger":"cluster","msg":"Rebalance completed successfully","cluster":"default/couchbase-primary"}
|
{"level":"info","ts":1642506706.5230074,"logger":"cluster","msg":"Reconcile completed","cluster":"default/couchbase-primary"}
|
Example logs from an application are also attached, filtered to the Coubhase SourceContext. Individual operations are failing with Ambiguous and UnambiguousTimeoutException.
Attachments
For Gerrit Dashboard: NCBC-3092 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
169149,2 | [WIP] NCBC-3092: Resolve DNS for each connection rather than node bootstrap | master | couchbase-net-client | Status: MERGED | +2 | +1 |