Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
2.7.0
-
Initial Couchbase Version : 7.6.0-2176
Upgrade Couchbase Version : 7.6.1-3200
Kubernetes Version : v1.30.0
CAO and operator : 2.7.0 built locally
Environment : Kind cluster
-
15 - First Frontier, 16 - Killing Time, 17 -Timetrap
-
2
Description
Cluster Setup
- Kind cluster locally run on Mac
- 5 nodes with all services
- 2 buckets
- Cluster version : 7.6.0-2176
- Upgrade version : 7.6.1-3200
Steps taken in the scenario
- Created a cluster
- Created 2 buckets
- Changed the storage backend of one of the cluster from couchstore to magma
- After swap rebalance of first pod for migration, issued an upgrade
- The upgrade was started before the migration was fully completed
- Henceforth, migration and upgrade were completed in a single swap rebalance operation for each pod - Tracked in
K8S-3583 - Post this already existing pod in cluster was attempted to be added to cluster repeatedly by the operator and the operation fails
{"level":"error","ts":"2024-07-17T14:48:03Z","logger":"cluster","msg":"Pod addition to cluster failed","cluster":"default/cb-example","pod":"cb-example-0008","error":"timeout: request failed: unexpected status code POST http://cb-example-0007.cb-example.default.svc:8091/controller/addNode 400 Bad Request: [\"Prepare join failed. Node is already part of cluster.\"]","stacktrace":"github.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).swapRebalanceMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:1855\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).handleUpgradeNode\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:1587\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).exec\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:323\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcileMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:266\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:173\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:544\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:591\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/controller/controller.go:90\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}{"level":"info","ts":"2024-07-17T14:48:03Z","logger":"cluster","msg":"Reconciliation failed","cluster":"default/cb-example","error":"swap rebalance failed to add new node to cluster: timeout: request failed: unexpected status code POST http://cb-example-0007.cb-example.default.svc:8091/controller/addNode 400 Bad Request: [\"Prepare join failed. Node is already part of cluster.\"]","stack":"github.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.Client.doRequest\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/core.go:240\ngithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.(*Client).Post\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/core.go:302\ngithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.(*Request).On.func1\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/api.go:222\ngithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.(*Request).On.func2.1\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/api.go:240\ngithub.com/couchbase/couchbase-operator/pkg/util/retryutil.Retry\n\tgithub.com/couchbase/couchbase-operator/pkg/util/retryutil/retryutil.go:14\ngithub.com/couchbase/couchbase-operator/pkg/util/retryutil.RetryFor\n\tgithub.com/couchbase/couchbase-operator/pkg/util/retryutil/retryutil.go:30\ngithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.(*Request).On.func2\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/api.go:243\ngithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil.(*Request).On\n\tgithub.com/couchbase/couchbase-operator/pkg/util/couchbaseutil/api.go:249\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).addMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/member.go:328\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).swapRebalanceMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:1834\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).handleUpgradeNode\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:1587\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*ReconcileMachine).exec\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/nodereconcile.go:323\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcileMembers\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:266\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:173\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:544\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\tgithub.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:591\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\tgithub.com/couchbase/couchbase-operator/pkg/controller/controller.go:90\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\tsigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227"}{"level":"info","ts":"2024-07-17T14:48:03Z","logger":"cluster","msg":"Resource updated","cluster":"default/cb-example","diff":"{v2.ClusterStatus}.Conditions[3].LastUpdateTime:2024-07-17T14:42:58Z->2024-07-17T14:48:03Z;{v2.ClusterStatus}.Conditions[3].Message:failed to rebalance: timeout: unexpected rebalance error->swap rebalance failed to add new node to cluster: timeout: request failed: unexpected status code POST http://cb-example-0007.cb-example.default.svc:8091/controller/addNode 400 Bad Request: [\"Prepare join failed. Node is already part of cluster.\"]"}
|
Also it considers rebalance a failure as the join failed
{"level":"info","ts":"2024-07-17T15:06:22Z","logger":"cluster","msg":"Resource updated","cluster":"default/cb-example","diff":"{v2.ClusterStatus}.Conditions[3].LastUpdateTime:2024-07-17T14:48:03Z->2024-07-17T15:06:22Z;{v2.ClusterStatus}.Conditions[3].Message:swap rebalance failed to add new node to cluster: timeout: request failed: unexpected status code POST http://cb-example-0007.cb-example.default.svc:8091/controller/addNode 400 Bad Request: [\"Prepare join failed. Node is already part of cluster.\"]->failed to rebalance: timeout: unexpected rebalance error"}
|
Operator logs:
https://cb-engineering.s3.amazonaws.com/K8S-3583/cbopinfo-20240717T213656+0530.tar.gz
Cluster logs:
https://cb-engineering.s3.amazonaws.com/K8S-3583/collectinfo-2024-07-17T160636-ns_1%40cb-example-0006.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3583/collectinfo-2024-07-17T160636-ns_1%40cb-example-0007.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3583/collectinfo-2024-07-17T160636-ns_1%40cb-example-0008.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3583/collectinfo-2024-07-17T160636-ns_1%40cb-example-0009.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/K8S-3583/collectinfo-2024-07-17T160636-ns_1%40cb-example-0010.cb-example.default.svc.zip
The cao tool and operator images were built locally on this commit
commit e00cf70597dbc0a7422c82f0efd0a1a28f75bfcd (HEAD -> master, origin/master, origin/HEAD)
|
Author: usamah jassat <usamah.jassat@couchbase.com> |
Date: Thu Jul 11 15:55:19 2024 +0100 K8S-3564: fix TestServerGroupRescheduling when more SGs |
|
Change-Id: I13dabc775ad8f47e6f9f89b3445a19a4dd28112e
|
Reviewed-on: [https://review.couchbase.org/c/couchbase-operator/+/212585] |
Reviewed-by: Justin Ashworth <justin.ashworth@couchbase.com> |
Tested-by: Build Bot <build@couchbase.com> |