Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-3358

CNG Pods Readiness probe fails if the exporter sidecar is present

    XMLWordPrintable

Details

    • 1

    Description

      I observed that whenever I have a prometheus exporter sidecar, CNG container doesnt get ready. This doesnt block our testing as we can remove the exporter sidecar for now. But if it is a supported setup, maybe it should be looked at.

      Setup (EKS):

      Operator: 2.6.1-115

      Exporter: 1.0.11-100

      CNG: latest (currently 0.2.1-134)

       

        monitoring:
          prometheus:
            enabled: true
            image: ghcr.io/cb-vanilla/exporter:1.0.11-100
            refreshRate: 60
        networking:
          cloudNativeGateway:
            image: ghcr.io/cb-vanilla/cloud-native-gateway:latest
          exposeAdminConsole: true
          exposedFeatures:
          - client 

      If I remove the monitoring sidecar, CNG starts without errors. 

      This seemed to be a port clash between metrics container and CNG readiness probe for port 9091.

       

      metrics container

        metrics:
          Container ID:  containerd://efdfb78813416bef366a86896b27ebbb0b22f1237aa8c1da705c42a2340ac5ad
          Image:         ghcr.io/cb-vanilla/exporter:1.0.11-100
          Image ID:      ghcr.io/cb-vanilla/exporter@sha256:87afa36e160a89fcef809555d8a42cceffad476942d6184a0e06fa1879390b43
          Port:          9091/TCP
          Host Port:     0/TCP
          Args:
            --per-node-refresh      60
          State:          Running
            Started:      Thu, 01 Feb 2024 02:54:08 -0800
          Ready:          True
          Restart Count:  0
          Readiness:      http-get http://:9091/readiness-probe delay=10s timeout=5s period=10s #success=1 #failure=3  

       

      and CNG container readiness probe:

          State:          Running
            Started:      Thu, 01 Feb 2024 02:54:09 -0800
          Ready:          False
          Restart Count:  0
          Readiness:      http-get http://:9091/ready delay=0s timeout=1s period=10s #success=1 #failure=3 

       

       

      Interestingly, when we remove the exporter pod, we still see "Readiness probe failed: HTTP probe failed with statuscode: 503" in the pod events, but "Ready: True". Although we can ping the endpoint:

      curl -X GET localhost:9091
      cloud native gateway webapi
      curl -X GET localhost:9091/ready
      ok 

      with 3 containers (server, metrics, cng)

      kubectl get pods
      NAME                                            READY   STATUS    RESTARTS   AGE
      cb-example-perf-0000                            2/3     Running   0          18m
      cb-example-perf-0001                            2/3     Running   0          17m
      cb-example-perf-0002                            2/3     Running   0          17m 

      with 2 containers (server, cng)

      kubectl get pods
      NAME                                            READY   STATUS    RESTARTS   AGE
      cb-example-perf-0000                            2/2     Running   0          14m
      cb-example-perf-0001                            2/2     Running   0          13m
      cb-example-perf-0002                            2/2     Running   0          13m 

      This is a three pod deployment using cb 7.2.4-7070. I am attaching the logs below.

      Attachments

        1. pod0002.log
          11 kB
        2. couchbase-cluster.yaml
          1 kB
        3. cng0.log
          11 kB
        4. cng1.log
          11 kB
        5. cng2.log
          11 kB
        6. pod0000.log
          11 kB
        7. pod0001.log
          11 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              justin.ashworth Justin Ashworth
              salim.salim Salim Salim
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty