Description
Couchbase Cluster Description
- Set up the cluster as per the required specifications
- Each node is an m5.4xlarge instance. (16 vCPUs and 64GB RAM)
- 6 Data Service, 4 Index Service and Query Service Nodes.
- 10 Buckets (with 1 replica), Full Eviction and Auto-failover set to 5s.
- ~3TB data loaded onto the cluster.
- 50 Primary Indexes with 1 Replica each. (Total 100 Indexes)
- Continuous data and query workload on all buckets during the update process.
Task: Upgrade EKS 1.25 -> 1.26
Observation:-
- Control Plane node Updated Successfully.
- Worker Node update failed
Follow-Ups :
- Why operator is still looking for 0005 when a new 0010 is added back to the cluster?
- How does this impact the overall EKS worker node upgrade, (may be related to PDB)
Analysis:-
- cb-example-0005 was failover and a new pod was added back named 0010
- Operator's and admission controller node is still on 1.25, node didn't get upgraded.
- Except 0010 (previous 0005), every CB pod is on 1.25.
The operator is trying to look for cb-example-0005, but it no longer exists and is stuck here.
{"level":"info","ts":"2024-05-30T09:53:06Z","logger":"cluster","msg":"Resource updated","cluster":"default/cb-example","diff":"-{v2.ClusterStatus}.Conditions[2->?]:{Type:Error Status:True LastUpdateTime:2024-05-30T09:27:32Z LastTransitionTime:2024-05-30T09:27:32Z Reason:ErrorEncountered Message:requested resource not found: unable to lookup node cb-example-0005.cb-example.default.svc:8091};-{v2.ClusterStatus}.Conditions[3->?]:{Type:Scaling Status:True LastUpdateTime:2024-05-30T09:27:34Z LastTransitionTime:2024-05-30T09:27:34Z Reason:ClusterScaling Message:The operator is attempting to scale the cluster};-{v2.ClusterStatus}.Conditions[4->?]:{Type:ScalingUp Status:True LastUpdateTime:2024-05-30T09:27:34Z LastTransitionTime:2024-05-30T09:27:34Z Reason:ScalingUp Message:Scaling Server Class data-only from 5 to 6};+{v2.ClusterStatus}.Autoscalers:[]"}
|
CB logs - http://supportal.couchbase.com/snapshot/563d7ce9bd15b54278019e5f584f95fb::0
https://cb-engineering.s3.amazonaws.com/EKS_UPGRADE_1.26_CB_7_2_5_failed/collectinfo-2024-05-30T121628-ns_1%40cb-example-0000.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/EKS_UPGRADE_1.26_CB_7_2_5_failed/collectinfo-2024-05-30T121628-ns_1%40cb-example-0001.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/EKS_UPGRADE_1.26_CB_7_2_5_failed/collectinfo-2024-05-30T121628-ns_1%40cb-example-0002.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/EKS_UPGRADE_1.26_CB_7_2_5_failed/collectinfo-2024-05-30T121628-ns_1%40cb-example-0003.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/EKS_UPGRADE_1.26_CB_7_2_5_failed/collectinfo-2024-05-30T121628-ns_1%40cb-example-0004.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/EKS_UPGRADE_1.26_CB_7_2_5_failed/collectinfo-2024-05-30T121628-ns_1%40cb-example-0006.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/EKS_UPGRADE_1.26_CB_7_2_5_failed/collectinfo-2024-05-30T121628-ns_1%40cb-example-0007.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/EKS_UPGRADE_1.26_CB_7_2_5_failed/collectinfo-2024-05-30T121628-ns_1%40cb-example-0008.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/EKS_UPGRADE_1.26_CB_7_2_5_failed/collectinfo-2024-05-30T121628-ns_1%40cb-example-0009.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/EKS_UPGRADE_1.26_CB_7_2_5_failed/collectinfo-2024-05-30T121628-ns_1%40cb-example-0010.cb-example.default.svc.zip
Operator logs - cbopinfo-20240530T174642+0530.tar.gz
Cluster SS :-
Attachments
Issue Links
- mentioned in
-
Page Loading...
For Gerrit Dashboard: K8S-3516 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
210742,2 | K8S-3516: wait for terminating addback node | 2.6.x | couchbase-operator | Status: MERGED | +2 | +1 |