Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: 7.6.4
Affects Version/s: 7.6.0
Component/s: ns_server
Labels:
- CAO
- Kubernetes
- cao
- kubernetes
- ns_server
- operator
- rebalance
- rebalance-failed
- rebalancefailed
Environment:
CAO Image : couchbase/couchbase-operator:2.7.0-arm64
Couchbase Enterprise Edition 7.6.0-2172
Environment : Kind Kubernetes environment run locally on Mac

Triage:
Untriaged
Operating System:
Linux x86_64
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
Cluster logs
https://cb-engineering.s3.amazonaws.com/MB-62724/collectinfo-2024-07-15T093417-ns_1%40cb-example-0000.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/MB-62724/collectinfo-2024-07-15T093417-ns_1%40cb-example-0001.cb-example.default.svc.zip
https://cb-engineering.s3.amazonaws.com/MB-62724/collectinfo-2024-07-15T093417-ns_1%40cb-example-0002.cb-example.default.svc.zip

Operator logs
https://cb-engineering.s3.amazonaws.com/MB-62724/cbopinfo-20240715T143931+0530.tar.gz

Show
Cluster logs https://cb-engineering.s3.amazonaws.com/MB-62724/collectinfo-2024-07-15T093417-ns_1%40cb-example-0000.cb-example.default.svc.zip https://cb-engineering.s3.amazonaws.com/MB-62724/collectinfo-2024-07-15T093417-ns_1%40cb-example-0001.cb-example.default.svc.zip https://cb-engineering.s3.amazonaws.com/MB-62724/collectinfo-2024-07-15T093417-ns_1%40cb-example-0002.cb-example.default.svc.zip Operator logs https://cb-engineering.s3.amazonaws.com/MB-62724/cbopinfo-20240715T143931+0530.tar.gz
Story Points:
0
Is this a Regression?:
Unknown

Steps to reproduce

Created a 3 node cluster on k8s with operator with all services
On one pod, memcached was killed in a loop. Multiple failovers and rebalance failures occur as expected
Stopped the memcached kill loop.
Rebalances beyond this fail in a loop(as triggered by the operator again and again) - tracked in MB-62724.
Rebalance fails with leader_activities_error.
Cluster status suddenly changes from unbalanced to balanced post this failure

Rebalance exited with reason {{badmatch,

{leader_activities_error,

{default,rebalance},

{quorum_lost,

{lease_lost,

'ns_1@cb-example-0001.cb-example.default.svc'}}}},

[{ns_rebalancer,rebalance,7,

[{file,"src/ns_rebalancer.erl"},{line,456}]},

{proc_lib,init_p_do_apply,3,

[{file,"proc_lib.erl"},{line,240}]}]}.

Rebalance Operation Id = 275e6c370d3c4bac4f45fe2fc175764b

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

There are no open Gerrit changes

[Rebalance][K8S] : Cluster status changes from unbalanced to balanced suddenly post rebalance failure