Details
Description
Test:
3 node cluster created with analytics enabled. Pods are killed sequentially, allowing for the node to be reported as down and failed over and the operator to balance in a replacement.
Observation:
Rebalance fails to complete fairly consistently, but succeeds on a second rebalance.
| NewMemberAdded | New member test-couchbase-rv58w-0000 added to cluster |
|
| NewMemberAdded | New member test-couchbase-rv58w-0001 added to cluster |
|
| NewMemberAdded | New member test-couchbase-rv58w-0002 added to cluster |
|
| RebalanceStarted | A rebalance has been started to balance data across the cluster |
|
| RebalanceCompleted | A rebalance has completed |
|
| BucketCreated | A new bucket `default` was created |
|
| MemberDown | Existing member test-couchbase-rv58w-0000 down |
|
| MemberFailedOver | Existing member test-couchbase-rv58w-0000 failed over |
|
| NewMemberAdded | New member test-couchbase-rv58w-0003 added to cluster |
|
| RebalanceStarted | A rebalance has been started to balance data across the cluster |
|
| RebalanceIncomplete | A rebalance is incomplete | <== event reason mismatch, expected 'MemberRemoved', actual 'RebalanceIncomplete'
|
| MemberRemoved | Existing member test-couchbase-rv58w-0000 removed from the cluster |
|
| RebalanceStarted | A rebalance has been started to balance data across the cluster |
|
| RebalanceCompleted | A rebalance has completed |
|
| MemberDown | Existing member test-couchbase-rv58w-0001 down |
|
| MemberFailedOver | Existing member test-couchbase-rv58w-0001 failed over |
|
| NewMemberAdded | New member test-couchbase-rv58w-0004 added to cluster |
|
| RebalanceStarted | A rebalance has been started to balance data across the cluster |
|
| MemberRemoved | Existing member test-couchbase-rv58w-0001 removed from the cluster |
|
| RebalanceCompleted | A rebalance has completed |
|
| MemberDown | Existing member test-couchbase-rv58w-0002 down |
|
| MemberFailedOver | Existing member test-couchbase-rv58w-0002 failed over |
|
| NewMemberAdded | New member test-couchbase-rv58w-0005 added to cluster |
|
| RebalanceStarted | A rebalance has been started to balance data across the cluster |
|
| RebalanceIncomplete | A rebalance is incomplete |
|
| MemberRemoved | Existing member test-couchbase-rv58w-0002 removed from the cluster |
|
| RebalanceStarted | A rebalance has been started to balance data across the cluster |
|
| RebalanceCompleted | A rebalance has completed |
|
Expectation:
Rebalances work consistently first time, every time
Timeline:
Server reports node 0 as failed over
{"level":"info","ts":1574417906.722955,"logger":"couchbaseutil","msg":"Node status","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0000","version":"6.0.3","class":"test_config_1","managed":true,"status":"failed"}
|
{"level":"info","ts":1574417906.722963,"logger":"couchbaseutil","msg":"Node status","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0001","version":"6.0.3","class":"test_config_1","managed":true,"status":"active"}
|
{"level":"info","ts":1574417906.7229683,"logger":"couchbaseutil","msg":"Node status","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0002","version":"6.0.3","class":"test_config_1","managed":true,"status":"active"}
|
A replacement is scheduled in
{"level":"info","ts":1574417908.4882917,"logger":"cluster","msg":"Pod unrecoverable","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0000","reason":"No volume mounts defined"}
|
{"level":"info","ts":1574417908.4883049,"logger":"cluster","msg":"Pod failed, deleting","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0000"}
|
{"level":"info","ts":1574417909.7209096,"logger":"cluster","msg":"Creating pod","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0003","image":"couchbase/server:6.0.3"}
|
The rebalance completes. We poll Server for 10s to see if the unbalanced status is cleared
{"level":"info","ts":1574417952.9106264,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/test-couchbase-rv58w","progress":66.66666666666667}
|
{"level":"debug","ts":1574417956.915534,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0003.test-couchbase-rv58w.default.svc:8091/pools/default/tasks","status":"200 OK","time_ms":3.624779}
|
{"level":"debug","ts":1574417956.9389496,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":4.94814}
|
{"level":"debug","ts":1574417957.9330237,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":3.774437}
|
{"level":"debug","ts":1574417958.9282775,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":5.232787}
|
{"level":"debug","ts":1574417959.929354,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":6.612067}
|
{"level":"debug","ts":1574417960.9301717,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":6.776474}
|
{"level":"debug","ts":1574417961.9228926,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0003.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":6.63882}
|
{"level":"debug","ts":1574417962.9338944,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":5.510516}
|
{"level":"debug","ts":1574417963.9275134,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":1.867191}
|
{"level":"debug","ts":1574417964.9334671,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":7.705285}
|
{"level":"debug","ts":1574417965.9380221,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":7.344765}
|
{"level":"debug","ts":1574417966.9293282,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":1.656747}
|
Next time around the reconcile loop the status is still flagged as unbalanced
{"level":"info","ts":1574417967.091391,"logger":"couchbaseutil","msg":"Node status","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0000","version":"6.0.3","class":"test_config_1","managed":true,"status":"unclustered"}
|
Attachments
Issue Links
- duplicates
-
MB-36485 n2n - Rebalance out of analytics node is stuck
- Closed