Details
-
Bug
-
Resolution: Fixed
-
Critical
-
1.2.2
Description
Steps to Reproduce
- Create a cluster with multiple server groups and exposed features using NodePorts, for example:
servers:
- size: 1
name: data
services:
- data
- size: 1
name: data2
services:
- data
- Wait for Operator to setup the cluster.
- Remove one of the server classes, e.g.:
servers:
- size: 1
name: data2
services:
- data
- Wait for the node to be removed.
Expectation
The pod is removed successfully.
Actual Behavior
The pod is never removed and the operator hangs trying to remove the pod:
time="2020-02-11T15:33:15Z" level=info msg="Member cb-example-0001 is no longer part of any server config, removing" cluster-name=cb-example module=cluster
|
time="2020-02-11T15:43:18Z" level=error msg="failed to reconcile: context deadline exceeded: Connection error - dial tcp 192.168.43.234:18091: connect: connection refused" cluster-name=cb-example module=cluster
|
Notes
The reason it is hanging is due to the node reachability check added in K8S-1084:
goroutine 123 [select]:
|
github.com/couchbase/couchbase-operator/pkg/util/netutil.WaitForHostPort(0x15d9020, 0xc00010bd40, 0xc0010badd0, 0x10, 0x0, 0x0)
|
/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/util/netutil/netutil.go:31 +0x19c
|
github.com/couchbase/couchbase-operator/pkg/cluster.waitAlternateAddressReachable(0xc001508e20, 0x0, 0x0)
|
/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:860 +0x1a7
|
github.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).reconcileMemberAlternateAddresses(0xc00049bd40, 0x0, 0x0)
|
/home/couchbase/jenkins/workspace/couchbase-operator-build/goproj/src/github.com/couchbase/couchbase-operator/pkg/cluster/reconcile.go:898 +0x182
|
github.com/couchbase/couchbase-operator/pkg/cluster.handleNodeServices(0xc0007263c0, 0xc00049bd40, 0x10, 0xc00047d820)
|
It looks like the problem is that we delete a reference to the node ports that the operator later needs to find the right node port to check.
As a result it instead is checking the worker node's IP on port 18091 (instead of the actual nodeport).
Workaround
Do not use NodePorts for exposedFeatures.
Attachments
Issue Links
- relates to
-
K8S-1331 Resources Should Handle Deleted Configuration Gracefully
- Open