Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-575

Operator fails to create pod with same name, then gives up and create new one

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.0.0
    • 1.1.0
    • kubernetes, operator
    • None
    • K8s running on Azure AKS

    Description

      Before PV volume support in operator behavior was very consistent, say we have pod0, pod1, and pod2. If we lose say pod2, we will get a new pod with name pod3.

      In 1.0, looks like behavior has changed, if we lose a pod say cb-op-aks-demo-0001, operator first tries to create pod with same name "cb-op-aks-demo-0001", gives up

       

      time="2018-09-10T20:28:35Z" level=info msg="An auto-failover has taken place" cluster-name=cb-op-aks-demo module=cluster
      time="2018-09-10T20:28:36Z" level=info msg="Creating a pod (cb-op-aks-demo-0001) running Couchbase enterprise-5.5.1" cluster-name=cb-op-aks-demo module=cluster
      time="2018-09-10T20:30:36Z" level=error msg="node http://cb-op-aks-demo-0001.cb-op-aks-demo.default.svc:8091 could not be recovered: context deadline exceeded" cluster-name=cb-op-aks-demo module=cluster

      Then it tries to create a new one

      time="2018-09-10T20:30:36Z" level=info msg="Creating a pod (cb-op-aks-demo-0005) running Couchbase enterprise-5.5.1" cluster-name=cb-op-aks-demo module=cluster
      time="2018-09-10T20:34:00Z" level=info msg="added member (cb-op-aks-demo-0005)" cluster-name=cb-op-aks-demo module=cluster

       

      Is this behavior expected?

      If this is expected then rebalance operation takes time, and one needs to wait for long time.

      I will attach detailed logs next.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            log files for k8s cluster are attached.

            ram.dhakne Ram Dhakne (Inactive) added a comment - log files for k8s cluster are attached.

            Very consistent behavior

            pod cb-op-aks-demo-0000 was deleted

            $ kubectl get pods --watch

            NAME READY STATUS RESTARTS AGE
            cb-op-aks-demo-0000 0/1 Terminating 0 2d
            cb-op-aks-demo-0002 1/1 Running 0 2d
            cb-op-aks-demo-0003 1/1 Running 0 2d
            cb-op-aks-demo-0004 1/1 Running 0 2d
            cb-op-aks-demo-0005 1/1 Running 0 1d
            couchbase-operator-6cb7687498-zfzq5 1/1 Running 4 11d
            cb-op-aks-demo-0000 0/1 Terminating 0 2d
            cb-op-aks-demo-0000 0/1 Terminating 0 2d
            cb-op-aks-demo-0000 0/1 Pending 0 0s
            cb-op-aks-demo-0000 0/1 Pending 0 0s
            cb-op-aks-demo-0000 0/1 ContainerCreating 0 0s
            cb-op-aks-demo-0006 0/1 Pending 0 0s
            cb-op-aks-demo-0006 0/1 Pending 0 1s
            cb-op-aks-demo-0006 0/1 Init:0/1 0 1s
            cb-op-aks-demo-0000 0/1 Running 0 4m
            cb-op-aks-demo-0000 1/1 Running 0 4m
            cb-op-aks-demo-0006 0/1 PodInitializing 0 2m
            cb-op-aks-demo-0006 0/1 Running 0 2m
            cb-op-aks-demo-0006 1/1 Running 0 3m
            cb-op-aks-demo-0000 1/1 Terminating 0 6m
            cb-op-aks-demo-0000 1/1 Terminating 0 6m
            ^C%
            14:38:24  ✘  ram.dhakne@Rams-MBP  ...Documents/work/k8s  ⬗ 1507.k8s  29m22s 
            $ kubectl get pods --watch
            NAME READY STATUS RESTARTS AGE
            cb-op-aks-demo-0002 1/1 Running 0 2d
            cb-op-aks-demo-0003 1/1 Running 0 2d
            cb-op-aks-demo-0004 1/1 Running 0 2d
            cb-op-aks-demo-0005 1/1 Running 0 1d
            cb-op-aks-demo-0006 1/1 Running 0 26m
            couchbase-operator-6cb7687498-zfzq5 1/1 Running 4 11d

             

            it gave up and cb-op-aks-demo-0006 pod was created, joined to the cluster.

            ram.dhakne Ram Dhakne (Inactive) added a comment - Very consistent behavior pod cb-op-aks-demo-0000 was deleted $ kubectl get pods --watch NAME READY STATUS RESTARTS AGE cb-op-aks-demo-0000 0/1 Terminating 0 2d cb-op-aks-demo-0002 1/1 Running 0 2d cb-op-aks-demo-0003 1/1 Running 0 2d cb-op-aks-demo-0004 1/1 Running 0 2d cb-op-aks-demo-0005 1/1 Running 0 1d couchbase-operator-6cb7687498-zfzq5 1/1 Running 4 11d cb-op-aks-demo-0000 0/1 Terminating 0 2d cb-op-aks-demo-0000 0/1 Terminating 0 2d cb-op-aks-demo-0000 0/1 Pending 0 0s cb-op-aks-demo-0000 0/1 Pending 0 0s cb-op-aks-demo-0000 0/1 ContainerCreating 0 0s cb-op-aks-demo-0006 0/1 Pending 0 0s cb-op-aks-demo-0006 0/1 Pending 0 1s cb-op-aks-demo-0006 0/1 Init:0/1 0 1s cb-op-aks-demo-0000 0/1 Running 0 4m cb-op-aks-demo-0000 1/1 Running 0 4m cb-op-aks-demo-0006 0/1 PodInitializing 0 2m cb-op-aks-demo-0006 0/1 Running 0 2m cb-op-aks-demo-0006 1/1 Running 0 3m cb-op-aks-demo-0000 1/1 Terminating 0 6m cb-op-aks-demo-0000 1/1 Terminating 0 6m ^C% 14:38:24  ✘  ram.dhakne@Rams-MBP  ...Documents/work/k8s  ⬗ 1507.k8s  29m22s  $ kubectl get pods --watch NAME READY STATUS RESTARTS AGE cb-op-aks-demo-0002 1/1 Running 0 2d cb-op-aks-demo-0003 1/1 Running 0 2d cb-op-aks-demo-0004 1/1 Running 0 2d cb-op-aks-demo-0005 1/1 Running 0 1d cb-op-aks-demo-0006 1/1 Running 0 26m couchbase-operator-6cb7687498-zfzq5 1/1 Running 4 11d   it gave up and cb-op-aks-demo-0006 pod was created, joined to the cluster.

            People

              mikew Mike Wiederhold [X] (Inactive)
              ram.dhakne Ram Dhakne (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty