Details
-
Bug
-
Resolution: Fixed
-
Major
-
6.0.1
-
GKE, Operator 2.0.0
-
CX Sprint 159, CX Sprint 160, CX Sprint 161, CX Sprint 162, CX Sprint 163, CX Sprint 164, CX Sprint 165, CX Sprint 166
Description
Test scenario:
- Three node cluster created.
- 3 data sets created, one encompassing the whole bucket, one for document IDs matching anything with a 1 in it, another for the inverse.
- Load generated with cbc_pillowfight.
- Pods killed one after the other allowing for the operator to repair the cluster.
- Rebalance appears fails on the first attempt (internally we wait for the task to complete, then poll the cluster status a few times to see if NS server requires a rebalance), then succeeds on a retry (possibly related to
MB-34928, but we still expect the rebalance to succeed) - On the final killing the rebalance appears to stall.
Cluster name: test-couchbase-q7pr9
In the operator logs (cbopinfo-20190715T161808+0100/default/deployment/couchbase-operator/couchbase-operator.log) we see (again possibly related to MB-34928):
{"level":"info","ts":1563203076.304325,"logger":"cluster","msg":"Pods failed over","cluster":"test-couchbase-q7pr9"}
|
{"level":"info","ts":1563203076.3044674,"logger":"cluster","msg":"Pod unrecoverable","cluster":"test-couchbase-q7pr9","name":"test-couchbase-q7pr9-0000","reason":"No volume mounts defined"}
|
{"level":"info","ts":1563203076.3044827,"logger":"cluster","msg":"Pod failed, deleting","cluster":"test-couchbase-q7pr9","name":"test-couchbase-q7pr9-0000"}
|
{"level":"info","ts":1563203078.3110914,"logger":"cluster","msg":"Creating pod","cluster":"test-couchbase-q7pr9","name":"test-couchbase-q7pr9-0003","image":"couchbase/server:enterprise-6.0.1"}
|
{"level":"info","ts":1563203100.3373904,"logger":"cluster","msg":"Pod added to cluster","cluster":"test-couchbase-q7pr9","name":"test-couchbase-q7pr9-0003"}
|
{"level":"info","ts":1563203100.4955597,"logger":"cluster","msg":"External address collection failed","cluster":"test-couchbase-q7pr9","name":"test-couchbase-q7pr9-0000"}
|
{"level":"info","ts":1563203101.137671,"logger":"couchbaseutil","msg":"Rebalancing","progress":0}
|
{"level":"info","ts":1563203105.1573079,"logger":"couchbaseutil","msg":"Rebalancing","progress":2.978124323153564}
|
{"level":"info","ts":1563203109.1806362,"logger":"couchbaseutil","msg":"Rebalancing","progress":10.5154862464804}
|
{"level":"info","ts":1563203113.1996868,"logger":"couchbaseutil","msg":"Rebalancing","progress":17.98787091184752}
|
{"level":"info","ts":1563203117.224926,"logger":"couchbaseutil","msg":"Rebalancing","progress":25.43859649122807}
|
{"level":"info","ts":1563203121.2435634,"logger":"couchbaseutil","msg":"Rebalancing","progress":32.68356075373619}
|
{"level":"debug","ts":1563203135.308571,"logger":"cluster","msg":"Reconciliation completed","cluster":"test-couchbase-q7pr9"}
|
{"level":"error","ts":1563203135.308714,"logger":"cluster","msg":"Reconciliation failed","cluster":"test-couchbase-q7pr9","error":"failed to rebalance: cluster reports rebalance incomplete","stacktrace":"github.com/couchbase/couchbase-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/home/simon/go/src/github.com/couchbase/couchbase-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).runReconcile\n\t/home/simon/go/src/github.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:382\ngithub.com/couchbase/couchbase-operator/pkg/cluster.(*Cluster).Update\n\t/home/simon/go/src/github.com/couchbase/couchbase-operator/pkg/cluster/cluster.go:399\ngithub.com/couchbase/couchbase-operator/pkg/controller.(*CouchbaseClusterReconciler).Reconcile\n\t/home/simon/go/src/github.com/couchbase/couchbase-operator/pkg/controller/controller.go:86\ngithub.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/simon/go/src/github.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/home/simon/go/src/github.com/couchbase/couchbase-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/home/simon/go/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/home/simon/go/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/home/simon/go/src/github.com/couchbase/couchbase-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
|
{"level":"debug","ts":1563203135.3773098,"logger":"cluster","msg":"Reconciliation starting","cluster":"test-couchbase-q7pr9"}
|
But it does finally fix itself. Note how long this rebalance takes.
However on the final pod slaying we get this...
{"level":"info","ts":1563203336.4409792,"logger":"cluster","msg":"Pods failed over","cluster":"test-couchbase-q7pr9"}
|
{"level":"info","ts":1563203336.4410956,"logger":"cluster","msg":"Pod unrecoverable","cluster":"test-couchbase-q7pr9","name":"test-couchbase-q7pr9-0002","reason":"No volume mounts defined"}
|
{"level":"info","ts":1563203336.44113,"logger":"cluster","msg":"Pod failed, deleting","cluster":"test-couchbase-q7pr9","name":"test-couchbase-q7pr9-0002"}
|
{"level":"info","ts":1563203338.4788873,"logger":"cluster","msg":"Creating pod","cluster":"test-couchbase-q7pr9","name":"test-couchbase-q7pr9-0005","image":"couchbase/server:enterprise-6.0.1"}
|
{"level":"info","ts":1563203360.2708702,"logger":"cluster","msg":"Pod added to cluster","cluster":"test-couchbase-q7pr9","name":"test-couchbase-q7pr9-0005"}
|
{"level":"info","ts":1563203360.4640975,"logger":"cluster","msg":"External address collection failed","cluster":"test-couchbase-q7pr9","name":"test-couchbase-q7pr9-0002"}
|
{"level":"info","ts":1563203361.0830722,"logger":"couchbaseutil","msg":"Rebalancing","progress":0}
|
{"level":"info","ts":1563203365.1335223,"logger":"couchbaseutil","msg":"Rebalancing","progress":3.869958952351842}
|
{"level":"info","ts":1563203369.1519842,"logger":"couchbaseutil","msg":"Rebalancing","progress":11.06540621770822}
|
{"level":"info","ts":1563203373.163263,"logger":"couchbaseutil","msg":"Rebalancing","progress":18.12210904723971}
|
{"level":"info","ts":1563203377.1741796,"logger":"couchbaseutil","msg":"Rebalancing","progress":25.01671994009732}
|
{"level":"info","ts":1563203381.186079,"logger":"couchbaseutil","msg":"Rebalancing","progress":31.75671925785666}
|
{"level":"info","ts":1563203385.1933627,"logger":"couchbaseutil","msg":"Rebalancing","progress":66.66666766666667}
|
{"level":"info","ts":1563203389.2043526,"logger":"couchbaseutil","msg":"Rebalancing","progress":66.666669}
|
{"level":"info","ts":1563203393.2122645,"logger":"couchbaseutil","msg":"Rebalancing","progress":66.66667033333334}
|
{"level":"info","ts":1563203397.2163284,"logger":"couchbaseutil","msg":"Rebalancing","progress":66.66667166666666}
|
{"level":"info","ts":1563203401.2228284,"logger":"couchbaseutil","msg":"Rebalancing","progress":66.666673}
|
{"level":"info","ts":1563203405.233028,"logger":"couchbaseutil","msg":"Rebalancing","progress":66.66667433333333}
|
{"level":"info","ts":1563203409.242073,"logger":"couchbaseutil","msg":"Rebalancing","progress":66.66667566666668}
|
{"level":"info","ts":1563203413.2541904,"logger":"couchbaseutil","msg":"Rebalancing","progress":66.66667699999999}
|
Then we get a timeout and fail the test.
Attachments
Issue Links
- links to