Details
-
Bug
-
Resolution: Fixed
-
Critical
-
2.0.0, 2.0.1
-
couchbase/couchbase-operator-internal:2.0.1-129
GKE, 9 nodes across three zones
-
1
Description
Scenario:
After completing validation where nothing particularly interesting happened, I left a cluster running idle until I could come back to it to do some additional testing.
When returning, it was in a bad state: the couchbase-operator pod had been evicted, seemingly because of memory usage and another pod had been started but cannot take over.
Some snippets that show the scenario:
$ kubectl get pods
|
NAME READY STATUS RESTARTS AGE
|
cb-example-0000 1/1 Running 0 19h
|
cb-example-0001 1/1 Running 0 19h
|
cb-example-0002 1/1 Running 0 19h
|
couchbase-operator-7849fcbdf8-7vx6f 0/1 Evicted 0 19h
|
couchbase-operator-7849fcbdf8-fhxfd 1/1 Running 3 9h
|
couchbase-operator-admission-659db8f47c-lnq8w 1/1 Running 0 19h
|
$ kubectl get pod couchbase-operator-7849fcbdf8-7vx6f -o=yaml
|
apiVersion: v1
|
kind: Pod
|
metadata:
|
annotations:
|
kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
|
couchbase-operator'
|
creationTimestamp: "2020-05-21T05:00:30Z"
|
…
|
status:
|
message: 'The node was low on resource: memory. Container couchbase-operator was
|
using 1322432Ki, which exceeds its request of 0. '
|
phase: Failed
|
reason: Evicted
|
startTime: "2020-05-21T05:00:30Z"
|
$ kubectl logs couchbase-operator-7849fcbdf8-7vx6f
|
Error from server (BadRequest): container "couchbase-operator" in pod "couchbase-operator-7849fcbdf8-7vx6f" is not available
|
$ kubectl logs couchbase-operator-7849fcbdf8-fhxfd
|
{"level":"info","ts":1590106368.5813437,"logger":"main","msg":"couchbase-operator","version":"2.0.1 (build 129)","revision":"release"}
|
{"level":"info","ts":1590106368.5815203,"logger":"leader","msg":"Trying to become the leader."}
|
{"level":"info","ts":1590106368.7579677,"logger":"leader","msg":"Found existing lock","LockOwner":"couchbase-operator-7849fcbdf8-7vx6f"}
|
{"level":"info","ts":1590106368.7690403,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106369.899112,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106372.302849,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106376.8437178,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106385.5639052,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106402.9332597,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106421.1433141,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106437.365369,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106453.879851,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106470.1998987,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106487.1745272,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106504.8339086,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106523.4471161,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106540.1497526,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106557.3762176,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106574.4044852,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106591.9278038,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106608.8434014,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106625.7913597,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106643.9734716,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106660.6835804,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106677.3440514,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106694.5121224,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106712.3477786,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106731.1186256,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106748.066332,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106765.0274565,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106783.4470472,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106800.1178305,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106818.8980896,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106837.1374507,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106854.8240201,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106870.9242716,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106887.439901,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106905.3935628,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106924.52387,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106940.7912645,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106958.7039573,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106974.9069276,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590106993.132693,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107010.1124046,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107026.7133389,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107044.4548016,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107062.2060027,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107079.106864,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107096.471069,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107114.180709,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107131.0022535,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107147.938793,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107166.4724083,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107183.6404681,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107202.4740648,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107219.4347818,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107238.3095837,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107254.6342692,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107273.7704782,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107290.022639,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107306.7432795,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107324.9325333,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107341.7185297,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107358.7246912,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107377.7280793,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107396.1131058,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107414.6899235,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107433.0410101,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107449.6363153,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107467.018406,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107485.8985083,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107504.0940337,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107523.2438898,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107542.2043552,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107558.504476,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107576.092078,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107595.068512,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107614.135988,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107631.2620823,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107649.484183,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107667.7745917,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107685.5878913,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107703.6759133,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107721.4528496,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107739.8923955,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107757.1949127,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107773.6254363,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107792.7910354,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107811.6710405,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107828.7146208,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107847.0338326,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107865.1061509,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107881.3897557,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107899.5453777,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107917.5590568,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107934.7560241,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107951.526589,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107969.250973,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590107985.860755,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108002.6389596,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108020.6585798,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108037.0748544,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108053.984442,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108071.307759,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108088.7086234,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108106.73171,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108124.5032713,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108142.521829,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108160.8654838,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108179.5358922,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108195.5475652,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108213.9125023,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108231.202479,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108248.8104842,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108266.7536564,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108284.0736544,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108300.1789293,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108316.194203,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108332.2138014,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108351.1553264,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108369.052364,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108386.852348,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108405.478092,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108424.2974164,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108441.7797256,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108459.7251244,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108475.819677,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108494.5376554,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108511.3468125,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108529.4100797,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108546.2124922,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108562.7781634,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108580.683732,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108599.2996216,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108617.5378613,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108633.64546,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108651.3835194,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108670.5178683,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108688.9310646,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108705.882542,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108724.3042827,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108740.801609,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108757.9493136,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108776.6199305,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108793.371261,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108811.3905873,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108828.99589,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108845.2952683,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108861.388419,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108878.6556964,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108896.5546746,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108915.5394893,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108933.379808,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108951.2755175,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108968.6027873,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590108986.3805819,"logger":"leader","msg":"Not the leader. Waiting."}
|
{"level":"info","ts":1590109003.9647026,"logger":"leader","msg":"Not the leader. Waiting."}
|
Of course, I could delete the pod to recover, but I'm leaving the environment as is to see if any further info can be gathered.
It is also interesting that the new pod has restarted twice. Is this because it couldn't acquire the lock and timed out trying to do so?
I grabbed Tommie McAfee and we looked through a number of other things, like nodes in the cluster, resources, possible noisy-memory-neighbor, but none of that pans out.
Attachments
Issue Links
- blocks
-
K8S-1506 Autonomous Operator (Kubernetes) 2.0.2 GA Release - target on web week of July 27
- Resolved
Gerrit Reviews
For Gerrit Dashboard: K8S-1492 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
129735,4 | K8S-1492: Skip readiness flagging if already ready | 2.0.x | couchbase-operator | Status: MERGED | +2 | +1 |