Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-836

Configurable Pod Readiness Probes

    XMLWordPrintable

Details

    Description

      It would be nice to be able to configure the readiness probe that the Couchbase operator sets for pods. Currently the initialDelaySeconds=10 and failureThreshold=1. This is not enough.

      I'm using a slightly customized image since I want to use Couchbase with Local Persistent Volumes and not use the default user:group 1000:1000 for security reasons. So I'm running a couple of commands before the Couchbase server is up and running to set permissions etc. Because the commands take a few seconds, and the server isn't up in time, I cannot get a cluster working.

      I verified it's a timing issue, because if I create the cluster, then immediately delete the operator, the first node comes up after between 10-20 seconds.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          simon.murray Simon Murray added a comment -

          This isn't a simple problem to solve, especially with the 1.2.0 release where we start using exec based readiness probes which only indicate ready when the pod is fully added to the cluster and data rebalanced.  This is so we can control pod eviction during Kubernetes upgrades with disruption budgets (e.g. you don't lose all your replicas for a single vbucket - that would be quite catastrophic!)  As such the rebalance duration is variable based on the amount of data that needs to be shifted around, so your 10-20 seconds becomes minutes, hours, days?  Add into the mix that 10-20 seconds only applies when the container image is already pulled onto the node, it could be a further 5 minutes in order to download that.

          So while we could allow configuration, how you configure it is going to be a time consuming affair, and afaict all it avoids is an event that can be ignored by monitoring software.

          I'm keen to understand why your cluster is not working.  In our testing with the new readiness semantics having the readiness probe taking 5 minutes to indicate ready has no adverse affects on cluster deployment.

          simon.murray Simon Murray added a comment - This isn't a simple problem to solve, especially with the 1.2.0 release where we start using exec based readiness probes which only indicate ready when the pod is fully added to the cluster and data rebalanced.  This is so we can control pod eviction during Kubernetes upgrades with disruption budgets (e.g. you don't lose all your replicas for a single vbucket - that would be quite catastrophic!)  As such the rebalance duration is variable based on the amount of data that needs to be shifted around, so your 10-20 seconds becomes minutes, hours, days?  Add into the mix that 10-20 seconds only applies when the container image is already pulled onto the node, it could be a further 5 minutes in order to download that. So while we could allow configuration, how you configure it is going to be a time consuming affair, and afaict all it avoids is an event that can be ignored by monitoring software. I'm keen to understand why your cluster is not working.  In our testing with the new readiness semantics having the readiness probe taking 5 minutes to indicate ready has no adverse affects on cluster deployment.

          People

            simon.murray Simon Murray
            marcusbooyah Marcus Bowyer
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty