Description
Changing cluster-name causes the cluster to get into an inconsistent state.
Please find the steps below to reproduce the issue:
- Existing Kubernetes cluster.
- kubectl get pods output shows three running nodes:
admins-MBP-109:couchbase-operator sindhura.palakodety$ kubectl get po |
NAME READY STATUS RESTARTS AGE
|
cb-example-0000 1/1 Running 0 2m |
cb-example-0001 1/1 Running 0 1m |
cb-example-0002 1/1 Running 0 1m |
couchbase-operator-789c895556-sgwrl 1/1 Running 0 2m |
- vi example/couchbase-cluster.yaml
- Change name from name: cb-example to name: cb-example-1
- Following errors are seen in the operator logs:
time="2018-01-03T00:06:00Z" level=warning msg="node init: failed with error Code: 0, Error: http://cb-example-1-0000.cb-example-1.default.svc:8091/node/controller/rename - Post http://cb-example-1-0000.cb-example-1.default.svc:8091/node/controller/rename: dial tcp 172.17.0.10:8091: getsockopt: connection refused ...retrying" cluster-name=cb-example-1 module=retryutil |
time="2018-01-03T00:06:01Z" level=info msg="Start reconciling" cluster-name=cb-example module=cluster |
time="2018-01-03T00:06:01Z" level=info msg="Finish reconciling" cluster-name=cb-example module=cluster |
time="2018-01-03T00:06:05Z" level=warning msg="node init: failed with error Code: 0, Error: http://cb-example-1-0000.cb-example-1.default.svc:8091/node/controller/rename - Post http://cb-example-1-0000.cb-example-1.default.svc:8091/node/controller/rename: dial tcp 172.17.0.10:8091: getsockopt: connection refused ...retrying" cluster-name=cb-example-1 module=retryutil |
time="2018-01-03T00:06:21Z" level=warning msg="add node: failed with error Code: 400, Error: error - Failed to reach erlang port mapper. Failed to resolve address for \"cb-example-1-0001.cb-example-1.default.svc\". The hostname may be incorrect or not resolvable. ...retrying" cluster-name=cb-example-1 module=retryutil |
time="2018-01-03T00:06:25Z" level=info msg="Start reconciling" cluster-name=cb-example module=cluster |
time="2018-01-03T00:06:25Z" level=info msg="Finish reconciling" cluster-name=cb-example module=cluster |
time="2018-01-03T00:06:26Z" level=warning msg="add node: failed with error Code: 400, Error: error - Prepare join failed. Could not connect to \"cb-example-1-0001.cb-example-1.default.svc\" on port 8091. This could be due to an incorrect host/port combination or a firewall in place between the servers. ...retrying" cluster-name=cb-example-1 module=retryutil |
time="2018-01-03T00:06:31Z" level=warning msg="add node: failed with error Code: 400, Error: error - Prepare join failed. Could not connect to \"cb-example-1-0001.cb-example-1.default.svc\" on port 8091. This could be due to an incorrect host/port combination or a firewall in place between the servers. ...retrying" cluster-name=cb-example-1 module=retryutil |
|
E0103 00:08:54.168813 5 leaderelection.go:258] Failed to update lock: Put https://10.96.0.1:443/api/v1/namespaces/default/endpoints/couchbase-operator: dial tcp 10.96.0.1:443: getsockopt: connection refused |
E0103 00:08:54.171115 5 reflector.go:315] github.com/couchbase/couchbase-operator/pkg/controller/controller.go:100: Failed to watch *v1beta1.CouchbaseCluster: Get https://10.96.0.1:443/apis/couchbase.database.couchbase.com/v1beta1/namespaces/default/couchbaseclusters?resourceVersion=49560&timeoutSeconds=544&watch=true: dial tcp 10.96.0.1:443: getsockopt: connection refused |
E0103 00:08:54.183256 5 leaderelection.go:224] error retrieving resource lock default/couchbase-operator: Get https://10.96.0.1:443/api/v1/namespaces/default/endpoints/couchbase-operator: dial tcp 10.96.0.1:443: getsockopt: connection refused |
E0103 00:08:54.185375 5 event.go:260] Could not construct reference to: '&v1.Endpoints{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"", GenerateName:"", Namespace:"", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, Subsets:[]v1.EndpointSubset(nil)}' due to: 'selfLink was empty, can't make reference'. Will not report event: 'Normal' 'LeaderElection' 'couchbase-operator-789c895556-sgwrl stopped leading' |
- Couchbase UI not allowing to login or perform any activity in this state.
- kubectl shows 6 pods:
|
admins-MBP-109:couchbase-operator sindhura.palakodety$ kubectl get po
NAME READY STATUS RESTARTS AGE
cb-example-0000 1/1 Running 0 12m
cb-example-0001 1/1 Running 0 12m
cb-example-0002 1/1 Running 0 12m
cb-example-1-0000 1/1 Running 0 9m
cb-example-1-0001 1/1 Running 0 8m
cb-example-1-0002 1/1 Running 0 8m
couchbase-operator-789c895556-sgwrl 1/1 Running 1 13m
admins-MBP-109:couchbase-operator sindhura.palakodety${code}