Details
-
Bug
-
Resolution: Fixed
-
Major
-
2.6.4
-
None
-
None
-
0
Description
Couchbase Cluster Description
- Set up the cluster as per the required specifications
- Each node is an m5.4xlarge instance. (16 vCPUs and 64GB RAM)
- 6 Data Service, 4 Index Service and Query Service Nodes.
- 10 Buckets (with 1 replica), Full Eviction and Auto-failover set to 5s.
- ~210GB data per bucket → ~2TB data loaded onto cluster.
- 50 Primary Indexes with 1 Replica each. (Total 100 Indexes)
- DeltaRecovery Upgrade to update Couchbase Server from 7.2.5 to 7.6.1
- Continuous data and query workload on all buckets during the update process.
- Interrupted upgrade by restarting a KV node.
Rebooted cb-example-0001 while upgrading
10:10:55 AM 17 May, 2024
Node 'ns_1@cb-example-0004.cb-example.default.svc' saw that node 'ns_1@cb-example-0001.cb-example.default.svc' went down. Details: [{nodedown_reason, connection_closed}]ns_node_disco 005ns_1@cb-example-0004.cb-example.default.svc
|
10:46:27 AM 17 May, 2024
Rebalance exited with reason {service_rebalance_failed,n1ql, {{badmatch, {error, {bad_nodes,n1ql,get_agent, [{'ns_1@cb-example-0009.cb-example.default.svc', {exit, {{nodedown, 'ns_1@cb-example-0009.cb-example.default.svc'}, {gen_server,call, [{'service_agent-n1ql', 'ns_1@cb-example-0009.cb-example.default.svc'}, get_agent,infinity]}}}}]}}}, [{service_manager,wait_for_agents,1, [{file,"src/service_manager.erl"}, {line,165}]}, {service_manager,run_op,1, show...ns_orchestrator 000ns_1@cb-example-0001.cb-example.default.svc
|
10:49:14 AM 17 May, 2024
Starting rebalance, KeepNodes = ['ns_1@cb-example-0000.cb-example.default.svc', 'ns_1@cb-example-0001.cb-example.default.svc', 'ns_1@cb-example-0002.cb-example.default.svc', 'ns_1@cb-example-0003.cb-example.default.svc', 'ns_1@cb-example-0004.cb-example.default.svc', 'ns_1@cb-example-0005.cb-example.default.svc', 'ns_1@cb-example-0006.cb-example.default.svc', 'ns_1@cb-example-0007.cb-example.default.svc', 'ns_1@cb-example-0008.cb-example.default.svc', 'ns_1@cb-example-0009.cb-example.default.svc'], EjectNodes = [], Failed over and being ejected nodes = []; **Delta recovery nodes = ['ns_1@cb-example-0000.cb-example.default.svc']**, Delta recovery buckets = all;; Operation Id = 8a8ec0d43d1a1ae6586f5a show...ns_orchestrator 000ns_1@cb-example-0001.cb-example.default.svc
|
11:03:05 AM 17 May, 2024
Rebalance interrupted due to auto-failover of nodes ['ns_1@cb-example-0008.cb-example.default.svc']. Rebalance Operation Id = 8b558d90ef57438882ec1fbe6c0db75fns_orchestrator 000ns_1@cb-example-0001.cb-example.default.svc
|
1:03:05 AM 17 May, 2024
Rebalance interrupted due to auto-failover of nodes ['ns_1@cb-example-0008.cb-example.default.svc']. Rebalance Operation Id = 8b558d90ef57438882ec1fbe6c0db75fns_orchestrator 000ns_1@cb-example-0001.cb-example.default.svc1
|
- Why did the N1QL rebalance fail?
- Why did the operator consider the rebalance failure as a rebalance completion and proceed with upgrading the next node? Shouldn't the cluster be balanced before the operator upgrades the next node?
CB logs - http://supportal.couchbase.com/snapshot/d98f94df31477f6c622956049790e725::1
s3://cb-customers-secure/k8s-3485_delta_ugrade_before_rebalance_complete/2024-05-17/collectinfo-2024-05-17t110024-ns_1@cb-example-0000.cb-example.default.svc-d2b7a81f0e30fc2c.zip
s3://cb-customers-secure/k8s-3485_delta_ugrade_before_rebalance_complete/2024-05-17/collectinfo-2024-05-17t110024-ns_1@cb-example-0001.cb-example.default.svc-c289c28a1bc81eb9.zip
s3://cb-customers-secure/k8s-3485_delta_ugrade_before_rebalance_complete/2024-05-17/collectinfo-2024-05-17t110024-ns_1@cb-example-0002.cb-example.default.svc-b59a45b634abd3a7.zip
s3://cb-customers-secure/k8s-3485_delta_ugrade_before_rebalance_complete/2024-05-17/collectinfo-2024-05-17t110024-ns_1@cb-example-0003.cb-example.default.svc-fd4119ace950d119.zip
s3://cb-customers-secure/k8s-3485_delta_ugrade_before_rebalance_complete/2024-05-17/collectinfo-2024-05-17t110024-ns_1@cb-example-0004.cb-example.default.svc-506c0c19b7108d29.zip
s3://cb-customers-secure/k8s-3485_delta_ugrade_before_rebalance_complete/2024-05-17/collectinfo-2024-05-17t110024-ns_1@cb-example-0005.cb-example.default.svc-60bf9c0169440d44.zip
s3://cb-customers-secure/k8s-3485_delta_ugrade_before_rebalance_complete/2024-05-17/collectinfo-2024-05-17t110024-ns_1@cb-example-0006.cb-example.default.svc-d1cc5490d0deb786.zip
s3://cb-customers-secure/k8s-3485_delta_ugrade_before_rebalance_complete/2024-05-17/collectinfo-2024-05-17t110024-ns_1@cb-example-0007.cb-example.default.svc-a22dcb24701a5cfc.zip
s3://cb-customers-secure/k8s-3485_delta_ugrade_before_rebalance_complete/2024-05-17/collectinfo-2024-05-17t110024-ns_1@cb-example-0009.cb-example.default.svc-381d02376b7ab129.zip
K8s Operator console logs while upgrade :- operator_logs.txt
Operator logs while upgrade :- cbopinfo-20240517T163222+0530.tar.gz