Details
-
Bug
-
Resolution: Fixed
-
Major
-
2.3.0-beta
-
None
-
None
-
1: Recovery to productivity, 3: SBEE, Multi-Cert
-
1
Description
When attempting to run something simple:
bin/cao certify --clean – -test TestCreateCluster
It unfortunately would exit after failing to create the pod:
{preformat}Initializing ...
Deleting pull secrets ...
Deleting certification pod ...
Creating service account ...
Creating cluster role ...
Creating cluster role binding ...
Creating artifacts volume ...
Creating pull secrets ...
Creating certification pod ...
Waiting for certification pod to become ready ...
Deleting certification pod ...
Deleting pull secrets ...
Deleting artifacts volume ...
Deleting cluster role binding ...
Deleting cluster role ...
Deleting service account ...
Certification error: failed to wait for condition: pod condition missing{preformat}
Investigation shows the pod couldn't start because of a lack of CPU, but there are a lot of old test namespaces:
{preformat}% kubectl describe pod certification
Name: certification
Namespace: default
Priority: 0
Node: <none>
Labels: <none>
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Containers:
certification:
Image: couchbase/operator-certification:2.3.0-beta1
Port: <none>
Host Port: <none>
Args:
-test.v
-test.run
TestOperator
-test.timeout
12h
-test.parallel
8
-color
-collect-logs
-test
TestCreateCluster
Requests:
cpu: 2
memory: 3Gi
Environment: <none>
Mounts:
/artifacts from artifacts (rw)
/var/run/secrets/kubernetes.io/serviceaccount from couchbase-operator-certification-token-5bhck (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
artifacts:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: artifacts
ReadOnly: false
couchbase-operator-certification-token-5bhck:
Type: Secret (a volume populated by a Secret)
SecretName: couchbase-operator-certification-token-5bhck
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m42s default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
Warning FailedScheduling 2m42s default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
ingenthr@ingenthr-mbp ~ % kubectl get namespace
NAME STATUS AGE
default Active 11d
kube-node-lease Active 11d
kube-public Active 11d
kube-system Active 11d
test-586zq Active 3h2m
test-bxxbh Active 177m
test-cdwq4 Active 179m
test-dwwtq Active 3h2m
test-kcl2r Active 3h2m
test-kxssz Active 3h3m
test-skfq6 Active 3h5m
test-skkls Active 3h6m
test-tgqkr Active 3h1m
test-w67bd Active 3h2m
ingenthr@ingenthr-mbp ~ % kubectl get po -n test-w67bd
NAME READY STATUS RESTARTS AGE
couchbase-operator-6dfdb7c9cc-bkbmn 1/1 Running 0 3h2m
test-couchbase-ssl9j-0000 1/1 Running 0 3h1m
ingenthr@ingenthr-mbp ~ % kubectl get po -n test-tgqkr
NAME READY STATUS RESTARTS AGE
couchbase-operator-6dfdb7c9cc-776bw 1/1 Running 0 3h2m
test-couchbase-h72cn-0000 1/1 Running 0 179m
test-couchbase-h72cn-0001 1/1 Running 0 172m
test-couchbase-h72cn-0002 0/1 Pending 0 25s{preformat} {preformat}
% kubectl get namespace
NAME STATUS AGE
default Active 11d
kube-node-lease Active 11d
kube-public Active 11d
kube-system Active 11d
test-586zq Active 3h4m
test-bxxbh Active 178m
test-cdwq4 Active 3h1m
test-dwwtq Active 3h3m
test-kcl2r Active 3h3m
test-kxssz Active 3h5m
test-skfq6 Active 3h7m
test-skkls Active 3h8m
test-tgqkr Active 3h3m
test-w67bd Active 3h3m
ingenthr@ingenthr-mbp ~ % kubectl get namespace --show-labels
NAME STATUS AGE LABELS
default Active 11d <none>
kube-node-lease Active 11d <none>
kube-public Active 11d <none>
kube-system Active 11d addonmanager.kubernetes.io/mode=Reconcile,control-plane=true,kubernetes.io/cluster-service=true
test-586zq Active 3h5m istio-injection=enabled
test-bxxbh Active 179m istio-injection=enabled
test-cdwq4 Active 3h2m istio-injection=enabled
test-dwwtq Active 3h4m istio-injection=enabled
test-kcl2r Active 3h4m istio-injection=enabled
test-kxssz Active 3h6m istio-injection=enabled
test-skfq6 Active 3h8m istio-injection=enabled
test-skkls Active 3h9m istio-injection=enabled
test-tgqkr Active 3h4m istio-injection=enabled
test-w67bd Active 3h4m istio-injection=enabled{preformat}
Suggestion would be to have a label on the namespaces for it to be clear they were created by certification, then `cao` with --clean can clean them up before trying to deploy a new pod.