Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-1492

couchbase-operator pod evicted, deployment did not recover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 2.0.2
    • 2.0.0, 2.0.1
    • operator
    • couchbase/couchbase-operator-internal:2.0.1-129
      GKE, 9 nodes across three zones
    • 1

    Description

      Scenario:
      After completing validation where nothing particularly interesting happened, I left a cluster running idle until I could come back to it to do some additional testing.

      When returning, it was in a bad state: the couchbase-operator pod had been evicted, seemingly because of memory usage and another pod had been started but cannot take over.

      Some snippets that show the scenario:

      $ kubectl get pods
      NAME                                            READY   STATUS    RESTARTS   AGE
      cb-example-0000                                 1/1     Running   0          19h
      cb-example-0001                                 1/1     Running   0          19h
      cb-example-0002                                 1/1     Running   0          19h
      couchbase-operator-7849fcbdf8-7vx6f             0/1     Evicted   0          19h
      couchbase-operator-7849fcbdf8-fhxfd             1/1     Running   3          9h
      couchbase-operator-admission-659db8f47c-lnq8w   1/1     Running   0          19h
      

      $ kubectl get pod couchbase-operator-7849fcbdf8-7vx6f -o=yaml
      apiVersion: v1
      kind: Pod
      metadata:
        annotations:
          kubernetes.io/limit-ranger: 'LimitRanger plugin set: cpu request for container
            couchbase-operator'
        creationTimestamp: "2020-05-21T05:00:30Z"
      status:
        message: 'The node was low on resource: memory. Container couchbase-operator was
          using 1322432Ki, which exceeds its request of 0. '
        phase: Failed
        reason: Evicted
        startTime: "2020-05-21T05:00:30Z"
      

      $ kubectl logs couchbase-operator-7849fcbdf8-7vx6f
      Error from server (BadRequest): container "couchbase-operator" in pod "couchbase-operator-7849fcbdf8-7vx6f" is not available
      $ kubectl logs couchbase-operator-7849fcbdf8-fhxfd
      {"level":"info","ts":1590106368.5813437,"logger":"main","msg":"couchbase-operator","version":"2.0.1 (build 129)","revision":"release"}
      {"level":"info","ts":1590106368.5815203,"logger":"leader","msg":"Trying to become the leader."}
      {"level":"info","ts":1590106368.7579677,"logger":"leader","msg":"Found existing lock","LockOwner":"couchbase-operator-7849fcbdf8-7vx6f"}
      {"level":"info","ts":1590106368.7690403,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106369.899112,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106372.302849,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106376.8437178,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106385.5639052,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106402.9332597,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106421.1433141,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106437.365369,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106453.879851,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106470.1998987,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106487.1745272,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106504.8339086,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106523.4471161,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106540.1497526,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106557.3762176,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106574.4044852,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106591.9278038,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106608.8434014,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106625.7913597,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106643.9734716,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106660.6835804,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106677.3440514,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106694.5121224,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106712.3477786,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106731.1186256,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106748.066332,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106765.0274565,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106783.4470472,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106800.1178305,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106818.8980896,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106837.1374507,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106854.8240201,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106870.9242716,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106887.439901,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106905.3935628,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106924.52387,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106940.7912645,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106958.7039573,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106974.9069276,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590106993.132693,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107010.1124046,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107026.7133389,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107044.4548016,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107062.2060027,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107079.106864,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107096.471069,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107114.180709,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107131.0022535,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107147.938793,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107166.4724083,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107183.6404681,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107202.4740648,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107219.4347818,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107238.3095837,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107254.6342692,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107273.7704782,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107290.022639,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107306.7432795,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107324.9325333,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107341.7185297,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107358.7246912,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107377.7280793,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107396.1131058,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107414.6899235,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107433.0410101,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107449.6363153,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107467.018406,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107485.8985083,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107504.0940337,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107523.2438898,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107542.2043552,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107558.504476,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107576.092078,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107595.068512,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107614.135988,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107631.2620823,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107649.484183,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107667.7745917,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107685.5878913,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107703.6759133,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107721.4528496,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107739.8923955,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107757.1949127,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107773.6254363,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107792.7910354,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107811.6710405,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107828.7146208,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107847.0338326,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107865.1061509,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107881.3897557,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107899.5453777,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107917.5590568,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107934.7560241,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107951.526589,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107969.250973,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590107985.860755,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108002.6389596,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108020.6585798,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108037.0748544,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108053.984442,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108071.307759,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108088.7086234,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108106.73171,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108124.5032713,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108142.521829,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108160.8654838,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108179.5358922,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108195.5475652,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108213.9125023,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108231.202479,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108248.8104842,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108266.7536564,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108284.0736544,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108300.1789293,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108316.194203,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108332.2138014,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108351.1553264,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108369.052364,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108386.852348,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108405.478092,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108424.2974164,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108441.7797256,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108459.7251244,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108475.819677,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108494.5376554,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108511.3468125,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108529.4100797,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108546.2124922,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108562.7781634,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108580.683732,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108599.2996216,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108617.5378613,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108633.64546,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108651.3835194,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108670.5178683,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108688.9310646,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108705.882542,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108724.3042827,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108740.801609,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108757.9493136,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108776.6199305,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108793.371261,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108811.3905873,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108828.99589,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108845.2952683,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108861.388419,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108878.6556964,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108896.5546746,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108915.5394893,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108933.379808,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108951.2755175,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108968.6027873,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590108986.3805819,"logger":"leader","msg":"Not the leader. Waiting."}
      {"level":"info","ts":1590109003.9647026,"logger":"leader","msg":"Not the leader. Waiting."}
      

      Of course, I could delete the pod to recover, but I'm leaving the environment as is to see if any further info can be gathered.

      It is also interesting that the new pod has restarted twice. Is this because it couldn't acquire the lock and timed out trying to do so?

      I grabbed Tommie McAfee and we looked through a number of other things, like nodes in the cluster, resources, possible noisy-memory-neighbor, but none of that pans out.

      Attachments

        Issue Links

          For Gerrit Dashboard: K8S-1492
          # Subject Branch Project Status CR V

          Activity

            People

              ingenthr Matt Ingenthron
              ingenthr Matt Ingenthron
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty