Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-3358

CNG Pods Readiness probe fails if the exporter sidecar is present

    XMLWordPrintable

Details

    • 1

    Description

      I observed that whenever I have a prometheus exporter sidecar, CNG container doesnt get ready. This doesnt block our testing as we can remove the exporter sidecar for now. But if it is a supported setup, maybe it should be looked at.

      Setup (EKS):

      Operator: 2.6.1-115

      Exporter: 1.0.11-100

      CNG: latest (currently 0.2.1-134)

       

        monitoring:
          prometheus:
            enabled: true
            image: ghcr.io/cb-vanilla/exporter:1.0.11-100
            refreshRate: 60
        networking:
          cloudNativeGateway:
            image: ghcr.io/cb-vanilla/cloud-native-gateway:latest
          exposeAdminConsole: true
          exposedFeatures:
          - client 

      If I remove the monitoring sidecar, CNG starts without errors. 

      This seemed to be a port clash between metrics container and CNG readiness probe for port 9091.

       

      metrics container

        metrics:
          Container ID:  containerd://efdfb78813416bef366a86896b27ebbb0b22f1237aa8c1da705c42a2340ac5ad
          Image:         ghcr.io/cb-vanilla/exporter:1.0.11-100
          Image ID:      ghcr.io/cb-vanilla/exporter@sha256:87afa36e160a89fcef809555d8a42cceffad476942d6184a0e06fa1879390b43
          Port:          9091/TCP
          Host Port:     0/TCP
          Args:
            --per-node-refresh      60
          State:          Running
            Started:      Thu, 01 Feb 2024 02:54:08 -0800
          Ready:          True
          Restart Count:  0
          Readiness:      http-get http://:9091/readiness-probe delay=10s timeout=5s period=10s #success=1 #failure=3  

       

      and CNG container readiness probe:

          State:          Running
            Started:      Thu, 01 Feb 2024 02:54:09 -0800
          Ready:          False
          Restart Count:  0
          Readiness:      http-get http://:9091/ready delay=0s timeout=1s period=10s #success=1 #failure=3 

       

       

      Interestingly, when we remove the exporter pod, we still see "Readiness probe failed: HTTP probe failed with statuscode: 503" in the pod events, but "Ready: True". Although we can ping the endpoint:

      curl -X GET localhost:9091
      cloud native gateway webapi
      curl -X GET localhost:9091/ready
      ok 

      with 3 containers (server, metrics, cng)

      kubectl get pods
      NAME                                            READY   STATUS    RESTARTS   AGE
      cb-example-perf-0000                            2/3     Running   0          18m
      cb-example-perf-0001                            2/3     Running   0          17m
      cb-example-perf-0002                            2/3     Running   0          17m 

      with 2 containers (server, cng)

      kubectl get pods
      NAME                                            READY   STATUS    RESTARTS   AGE
      cb-example-perf-0000                            2/2     Running   0          14m
      cb-example-perf-0001                            2/2     Running   0          13m
      cb-example-perf-0002                            2/2     Running   0          13m 

      This is a three pod deployment using cb 7.2.4-7070. I am attaching the logs below.

      Attachments

        1. cng0.log
          11 kB
        2. cng1.log
          11 kB
        3. cng2.log
          11 kB
        4. couchbase-cluster.yaml
          1 kB
        5. pod0000.log
          11 kB
        6. pod0001.log
          11 kB
        7. pod0002.log
          11 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              justin.ashworth Justin Ashworth
              salim.salim Salim Salim
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty