Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Fixed
Priority: Major
Fix Version/s: 1.2.0
Affects Version/s: None
Component/s: operator, testing
Labels:
None

Description

Over the weekend got a timeout

04:23:08.045         util.go:692: 2019-03-08 02:08:29.872816221 -0800 PST m=+1871.512206167 Cluster healthy

04:23:08.045         util.go:1136: context deadline exceeded: context deadline exceeded: error upgrading connection: pods "test-couchbase-zjfgw-0006" not found

04:23:08.045         util.go:1137: goroutine 2659 [running]:

04:23:08.045             runtime/debug.Stack(0xc000aea000, 0xc0009d77b0, 0x1)

04:23:08.045             	/jenkins/workspace/operator-gke-p0/go/src/runtime/debug/stack.go:24 +0xb5

04:23:08.045             github.com/couchbase/couchbase-operator/test/e2e/e2eutil.Die(0xc000aea000, 0x1e0d840, 0xc00035c510)

04:23:08.045             	/jenkins/workspace/operator-gke-p0/gopath/src/github.com/couchbase/couchbase-operator/test/e2e/e2eutil/util.go:1137 +0x88

04:23:08.045             github.com/couchbase/couchbase-operator/test/e2e/e2eutil.MustVerifyServices(0xc000aea000, 0xc00019cd20, 0xc0002c3000, 0xdf8475800, 0xc0006dc150, 0xc000676390, 0x1, 0x1)

04:23:08.045             	/jenkins/workspace/operator-gke-p0/gopath/src/github.com/couchbase/couchbase-operator/test/e2e/e2eutil/couchbase_util.go:665 +0xc5

04:23:08.045             github.com/couchbase/couchbase-operator/test/e2e.TestSwapNodesBetweenServices(0xc000aea000)

04:23:08.045             	/jenkins/workspace/operator-gke-p0/gopath/src/github.com/couchbase/couchbase-operator/test/e2e/cluster_test.go:746 +0x1fb5

04:23:08.045             github.com/couchbase/couchbase-operator/test/e2e/framework.RecoverDecorator.func1(0xc000aea000)

04:23:08.045             	/jenkins/workspace/operator-gke-p0/gopath/src/github.com/couchbase/couchbase-operator/test/e2e/framework/test_util.go:517 +0x7b

04:23:08.045             testing.tRunner(0xc000aea000, 0xc000adf970)

04:23:08.045             	/jenkins/workspace/operator-gke-p0/go/src/testing/testing.go:827 +0x163

04:23:08.045             created by testing.(*T).Run

04:23:08.045             	/jenkins/workspace/operator-gke-p0/go/src/testing/testing.go:878 +0x651

The operator show pod 6 getting balanced out

time="2019-03-08T10:08:40Z" level=info msg="Creating a pod (test-couchbase-zjfgw-0007) running Couchbase enterprise-5.5.3" cluster-name=test-couchbase-zjfgw module=cluster

time="2019-03-08T10:08:58Z" level=info msg="added member (test-couchbase-zjfgw-0007)" cluster-name=test-couchbase-zjfgw module=cluster

time="2019-03-08T10:08:59Z" level=info msg="Rebalance progress: 0.000000" cluster-name=test-couchbase-zjfgw module=cluster

time="2019-03-08T10:09:03Z" level=info msg="Rebalance progress: 75.000000" cluster-name=test-couchbase-zjfgw module=cluster

time="2019-03-08T10:09:07Z" level=info msg="Rebalance progress: 75.000000" cluster-name=test-couchbase-zjfgw module=cluster

time="2019-03-08T10:09:11Z" level=info msg="Rebalance progress: 75.000000" cluster-name=test-couchbase-zjfgw module=cluster

time="2019-03-08T10:09:15Z" level=info msg="Rebalance progress: 75.000000" cluster-name=test-couchbase-zjfgw module=cluster

time="2019-03-08T10:09:19Z" level=info msg="Rebalance progress: 75.000000" cluster-name=test-couchbase-zjfgw module=cluster

time="2019-03-08T10:09:24Z" level=info msg="deleted pod (test-couchbase-zjfgw-0006)" cluster-name=test-couchbase-zjfgw module=cluster

time="2019-03-08T10:09:24Z" level=info msg="reconcile finished" cluster-name=test-couchbase-zjfgw module=cluster

In theory this race is due to the portforward upgrade taking over the 1 minute timeout period. This proactively seeks to add an aggressive 10s roundtrip timeout to the port forwarder so we can at least rotate the client a few times during the overall timeout period, at which point pod 6 will almost certainly not be there to be used.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Simon Murray

Reporter:: Simon Murray

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/Mar/19 3:36 AM

Updated:: 11/Mar/19 5:13 AM

Resolved:: 11/Mar/19 5:13 AM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

K8S-888: Add Port Forwarder Timeout: Gerrit Review:

Add PortForward Timeout

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty