Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
2.7.0
-
Initial Couchbase Version : 7.2.5-7596
Upgraded Couchbase Version : 7.6.1-3200
Kubernetes Version : v1.30.0
CAO and operator : 2.7.0 built locally
Environment : Kind cluster
-
3
Description
Cluster Setup
- Kind cluster locally run on Mac
- 3 nodes with all services
- 1 bucket
- Initial cluster version : 7.2.5
- Upgrade cluster version : 7.6.1
Steps taken in the scenario
- Created a cluster
- Issues an upgrade from 7.2.5-7596 to 7.6.1-3200
- Swap rebalance upgrade takes place.
- cb-example-0000 and cb-example-0001 are replaced by cb-example-0003 and cb-example-0004
- When upgrade swap rebalance of cb-example-0002 is taking place by replacing with cb-example-0005, issued an downgrade back to 7.2.5.
- Upgrade goes through fine
- Post upgrade operator tries to add a pod with 7.2.5 onto the cluster. The addition is not allowed and fails.
- Operator continues to retry the procedure and fails and this occurs forever in an infinite loop.
{"level":"info","ts":"2024-07-16T09:46:07Z","logger":"cluster","msg":"cb-example-0004"}
|
{"level":"info","ts":"2024-07-16T09:46:07Z","logger":"cluster","msg":"No persistent volumes in cluster. Reverting to SwapRebalance.","cluster":"default/cb-example"}
|
{"level":"info","ts":"2024-07-16T09:46:07Z","logger":"cluster","msg":"Upgrading pods with SwapRebalance","cluster":"default/cb-example","names":["cb-example-0004"],"target-version":"7.2.5"}
|
{"level":"info","ts":"2024-07-16T09:46:07Z","logger":"cluster","msg":"Swap-Rebalancing pod ","cluster":"default/cb-example","name":"cb-example-0004","source-version":"7.6.1"}
|
{"level":"info","ts":"2024-07-16T09:46:07Z","logger":"kubernetes","msg":"Creating pod","cluster":"default/cb-example","name":"cb-example-0015","image":"couchbase/server:7.2.5"}
|
{"level":"info","ts":"2024-07-16T09:46:19Z","logger":"cluster","msg":"Resource updated","cluster":"default/cb-example","diff":"{v2.ClusterStatus}.Size:3->4;+{v2.ClusterStatus}.Members.Unready:[cb-example-0015]"}
|
{"level":"info","ts":"2024-07-16T09:49:20Z","logger":"cluster","msg":"Pod added to cluster","cluster":"default/cb-example","name":"cb-example-0015"}
|
{"level":"error","ts":"2024-07-16T09:49:20Z","logger":"cluster","msg":"Pod addition to cluster failed","cluster":"default/cb-example","pod":"cb-example-0015","error":"timeout: request failed: unexpected status code POST http://cb-example-0004.cb-example.default.svc:8091/controller/addNode 400 Bad Request: [\"This node cannot add another node ('ns_1@cb-example-0015.cb-example.default.svc') because of cluster version compatibility mismatch. Cluster works in [7,6] mode and node only supports [7,2]\"]","stacktrace":"github.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).swapRebalanceMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:1855\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).handleUpgradeNode\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:1587\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).exec\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:323\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcileMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:266\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:173\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:544\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:591\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/controller/controller.go:90\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
|
{"level":"info","ts":"2024-07-16T09:49:20Z","logger":"cluster","msg":"Reconciliation failed","cluster":"default/cb-example","error":"swap rebalance failed to add new node to cluster: timeout: request failed: unexpected status code POST http://cb-example-0004.cb-example.default.svc:8091/controller/addNode 400 Bad Request: [\"This node cannot add another node ('ns_1@cb-example-0015.cb-example.default.svc') because of cluster version compatibility mismatch. Cluster works in [7,6] mode and node only supports [7,2]\"]","stack":"github.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.Client.doRequest\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/core.go:240\ngithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.(*Client).Post\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/core.go:302\ngithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.(*Request).On.func1\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/api.go:222\ngithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.(*Request).On.func2.1\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/api.go:240\ngithub.com/couchbase/couchbase-operator/pkg/util/retryutil.Retry\n\tgithub.com/couchbase/couchbase-operator/pkg/util/retryutil/retryutil.go:14\ngithub.com/couchbase/couchbase-operator/pkg/util/retryutil.RetryFor\n\tgithub.com/couchbase/couchbase-operator/pkg/util/retryutil/retryutil.go:30\ngithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.(*Request).On.func2\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/api.go:243\ngithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.(*Request).On\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/api.go:249\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).addMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/member.go:328\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).swapRebalanceMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:1834\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).handleUpgradeNode\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:1587\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).exec\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:323\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcileMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:266\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:173\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:544\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:591\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/controller/controller.go:90\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}
|
{"level":"info","ts":"2024-07-16T09:49:20Z","logger":"cluster","msg":"Resource updated","cluster":"default/cb-example","diff":"{v2.ClusterStatus}.Conditions[3].LastUpdateTime:2024-07-16T09:46:05Z->2024-07-16T09:49:20Z;{v2.ClusterStatus}.Conditions[3].Message:swap rebalance failed to add new node to cluster: timeout: request failed: unexpected status code POST http://cb-example-0003.cb-example.default.svc:8091/controller/addNode 400 Bad Request: [\"This node cannot add another node ('ns_1@cb-example-0014.cb-example.default.svc') because of cluster version compatibility mismatch. Cluster works in [7,6] mode and node only supports [7,2]\"]->swap rebalance failed to add new node to cluster: timeout: request failed: unexpected status code POST http://cb-example-0004.cb-example.default.svc:8091/controller/addNode 400 Bad Request: [\"This node cannot add another node ('ns_1@cb-example-0015.cb-example.default.svc') because of cluster version compatibility mismatch. Cluster works in [7,6] mode and node only supports [7,2]\"]"}
|
Issue
- Operator should not try to downgrade once the upgrade is successful.
Operator logs :
https://cb-engineering.s3.amazonaws.com/K8S-3576/cbopinfo-20240716T151927+0530.tar.gz
Cluster logs :
https://cb-engineering.s3.amazonaws.com/K8S-3576/collectinfo-2024-07-16T095145-ns_1%40cb-example-0003.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3576/collectinfo-2024-07-16T095145-ns_1%40cb-example-0004.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3576/collectinfo-2024-07-16T095145-ns_1%40cb-example-0005.cb-example.default.svc.zip
The cao tool and operator images were built locally on this commit
commit e00cf70597dbc0a7422c82f0efd0a1a28f75bfcd (HEAD -> master, origin/master, origin/HEAD)
|
Author: usamah jassat <usamah.jassat@couchbase.com> |
Date: Thu Jul 11 15:55:19 2024 +0100 K8S-3564: fix TestServerGroupRescheduling when more SGs |
|
Change-Id: I13dabc775ad8f47e6f9f89b3445a19a4dd28112e
|
Reviewed-on: https://review.couchbase.org/c/couchbase-operator/+/212585 |
Reviewed-by: Justin Ashworth <justin.ashworth@couchbase.com> |
Tested-by: Build Bot <build@couchbase.com> |