Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-888

Add PortForward Timeout

    XMLWordPrintable

Details

    • Task
    • Resolution: Fixed
    • Major
    • 1.2.0
    • None
    • operator, testing
    • None

    Description

      Over the weekend got a timeout

       

      04:23:08.045         util.go:692: 2019-03-08 02:08:29.872816221 -0800 PST m=+1871.512206167 Cluster healthy
      04:23:08.045         util.go:1136: context deadline exceeded: context deadline exceeded: error upgrading connection: pods "test-couchbase-zjfgw-0006" not found
      04:23:08.045         util.go:1137: goroutine 2659 [running]:
      04:23:08.045             runtime/debug.Stack(0xc000aea000, 0xc0009d77b0, 0x1)
      04:23:08.045             	/jenkins/workspace/operator-gke-p0/go/src/runtime/debug/stack.go:24 +0xb5
      04:23:08.045             github.com/couchbase/couchbase-operator/test/e2e/e2eutil.Die(0xc000aea000, 0x1e0d840, 0xc00035c510)
      04:23:08.045             	/jenkins/workspace/operator-gke-p0/gopath/src/github.com/couchbase/couchbase-operator/test/e2e/e2eutil/util.go:1137 +0x88
      04:23:08.045             github.com/couchbase/couchbase-operator/test/e2e/e2eutil.MustVerifyServices(0xc000aea000, 0xc00019cd20, 0xc0002c3000, 0xdf8475800, 0xc0006dc150, 0xc000676390, 0x1, 0x1)
      04:23:08.045             	/jenkins/workspace/operator-gke-p0/gopath/src/github.com/couchbase/couchbase-operator/test/e2e/e2eutil/couchbase_util.go:665 +0xc5
      04:23:08.045             github.com/couchbase/couchbase-operator/test/e2e.TestSwapNodesBetweenServices(0xc000aea000)
      04:23:08.045             	/jenkins/workspace/operator-gke-p0/gopath/src/github.com/couchbase/couchbase-operator/test/e2e/cluster_test.go:746 +0x1fb5
      04:23:08.045             github.com/couchbase/couchbase-operator/test/e2e/framework.RecoverDecorator.func1(0xc000aea000)
      04:23:08.045             	/jenkins/workspace/operator-gke-p0/gopath/src/github.com/couchbase/couchbase-operator/test/e2e/framework/test_util.go:517 +0x7b
      04:23:08.045             testing.tRunner(0xc000aea000, 0xc000adf970)
      04:23:08.045             	/jenkins/workspace/operator-gke-p0/go/src/testing/testing.go:827 +0x163
      04:23:08.045             created by testing.(*T).Run
      04:23:08.045             	/jenkins/workspace/operator-gke-p0/go/src/testing/testing.go:878 +0x651

      The operator show pod 6 getting balanced out

       

       

      time="2019-03-08T10:08:40Z" level=info msg="Creating a pod (test-couchbase-zjfgw-0007) running Couchbase enterprise-5.5.3" cluster-name=test-couchbase-zjfgw module=cluster
      time="2019-03-08T10:08:58Z" level=info msg="added member (test-couchbase-zjfgw-0007)" cluster-name=test-couchbase-zjfgw module=cluster
      time="2019-03-08T10:08:59Z" level=info msg="Rebalance progress: 0.000000" cluster-name=test-couchbase-zjfgw module=cluster
      time="2019-03-08T10:09:03Z" level=info msg="Rebalance progress: 75.000000" cluster-name=test-couchbase-zjfgw module=cluster
      time="2019-03-08T10:09:07Z" level=info msg="Rebalance progress: 75.000000" cluster-name=test-couchbase-zjfgw module=cluster
      time="2019-03-08T10:09:11Z" level=info msg="Rebalance progress: 75.000000" cluster-name=test-couchbase-zjfgw module=cluster
      time="2019-03-08T10:09:15Z" level=info msg="Rebalance progress: 75.000000" cluster-name=test-couchbase-zjfgw module=cluster
      time="2019-03-08T10:09:19Z" level=info msg="Rebalance progress: 75.000000" cluster-name=test-couchbase-zjfgw module=cluster
      time="2019-03-08T10:09:24Z" level=info msg="deleted pod (test-couchbase-zjfgw-0006)" cluster-name=test-couchbase-zjfgw module=cluster
      time="2019-03-08T10:09:24Z" level=info msg="reconcile finished" cluster-name=test-couchbase-zjfgw module=cluster

      In theory this race is due to the portforward upgrade taking over the 1 minute timeout period.  This proactively seeks to add an aggressive 10s roundtrip timeout to the port forwarder so we can at least rotate the client a few times during the overall timeout period, at which point pod 6 will almost certainly not be there to be used.

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            simon.murray Simon Murray
            simon.murray Simon Murray
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty