Details
Description
Find the details of the Delta Recovery Upgrade here (with complete logs):
https://issues.couchbase.com/browse/K8S-3548?focusedId=784164&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-784164
Bug / Issue
Delta upgrade of index-query cb-example-0007 has started and rebalance has not been completed.
2024-07-04T15:46:17.675Z, menelaus_web_sup:1:info:web start ok(ns_1@cb-example-0007.cb-example.default.svc) - Couchbase Server has started on web port 8091 on node 'ns_1@cb-example-0007.cb-example.default.svc'. Version: "7.6.1-3200-enterprise". |
2024-07-04T15:46:39.757Z, ns_node_disco:4:info:node up(ns_1@cb-example-0003.cb-example.default.svc) - Node 'ns_1@cb-example-0003.cb-example.default.svc' saw that node 'ns_1@cb-example-0007.cb-example.default.svc' came up. Tags: [] (repeated 1 times, last seen 22.155195 secs ago) |
2024-07-04T15:46:48.278Z, ns_orchestrator:0:info:message(ns_1@cb-example-0006.cb-example.default.svc) - Starting rebalance, KeepNodes = ['ns_1@cb-example-0000.cb-example.default.svc', |
'ns_1@cb-example-0001.cb-example.default.svc', |
'ns_1@cb-example-0003.cb-example.default.svc', |
'ns_1@cb-example-0004.cb-example.default.svc', |
'ns_1@cb-example-0005.cb-example.default.svc', |
'ns_1@cb-example-0006.cb-example.default.svc', |
'ns_1@cb-example-0007.cb-example.default.svc', |
'ns_1@cb-example-0008.cb-example.default.svc', |
'ns_1@cb-example-0009.cb-example.default.svc', |
'ns_1@cb-example-0010.cb-example.default.svc'], EjectNodes = [], Failed over and being ejected nodes = []; Delta recovery nodes = ['ns_1@cb-example-0007.cb-example.default.svc'], Delta recovery buckets = all;; Operation Id = af7d1bec05eac9184c59f0dbfb0f2140 |
The rebalance fails, yet cb-example-0008 is failed over.
2024-07-04T15:48:00.047Z, ns_orchestrator:0:critical:message(ns_1@cb-example-0006.cb-example.default.svc) - Rebalance exited with reason {service_rebalance_failed,index, |
{agent_died,<0.1788.0>, |
{linked_process_died,<0.29724.0>, |
{'ns_1@cb-example-0006.cb-example.default.svc', |
{timeout,
|
{gen_server,call,
|
[<0.2865.0>, |
{call,"ServiceAPI.StartTopologyChange", |
#Fun<json_rpc_connection.0.36915653>, |
#{timeout => 60000}}, |
60000]}}}}}}. |
Rebalance Operation Id = af7d1bec05eac9184c59f0dbfb0f2140
|
2024-07-04T15:48:04.390Z, failover:0:info:message(ns_1@cb-example-0006.cb-example.default.svc) - Starting failing over ['ns_1@cb-example-0008.cb-example.default.svc'] |
2024-07-04T15:48:04.390Z, ns_orchestrator:0:info:message(ns_1@cb-example-0006.cb-example.default.svc) - Starting failover of nodes ['ns_1@cb-example-0008.cb-example.default.svc'] AllowUnsafe = false Operation Id = 794728626a4e024557c3d166aef5a38b |
2024-07-04T15:48:06.506Z, auto_failover:0:info:message(ns_1@cb-example-0006.cb-example.default.svc) - Could not automatically fail over nodes (['ns_1@cb-example-0006.cb-example.default.svc']). Failover is running. |
2024-07-04T15:48:07.505Z, auto_failover:0:info:message(ns_1@cb-example-0006.cb-example.default.svc) - Could not automatically fail over node ('ns_1@cb-example-0006.cb-example.default.svc') due to operation being unsafe for service index. Failing over nodes cb-example-0006.cb-example.default.svc:9102(d2684d8551e1ab69a3d890890dda3252) would lose the following indexes/partitions: bucket4._default._default.primary_idx_bucket4_1 0 bucket4._default._default.primary_idx_bucket4_2 0 |
2024-07-04T15:48:11.507Z, auto_failover:0:info:message(ns_1@cb-example-0006.cb-example.default.svc) - Could not automatically fail over node ('ns_1@cb-example-0006.cb-example.default.svc') due to operation being unsafe for service index. Failing over nodes cb-example-0006.cb-example.default.svc:9102(d2684d8551e1ab69a3d890890dda3252) would lose the following indexes/partitions: bucket4._default._default.primary_idx_bucket4_2 0 bucket4._default._default.primary_idx_bucket4_1 0 |
2024-07-04T15:49:00.131Z, failover:0:critical:message(ns_1@cb-example-0006.cb-example.default.svc) - Failed over ['ns_1@cb-example-0008.cb-example.default.svc']. Failover couldn't complete on some nodes: |
['ns_1@cb-example-0008.cb-example.default.svc'] |
2024-07-04T15:49:00.158Z, failover:0:info:message(ns_1@cb-example-0006.cb-example.default.svc) - Deactivating failed over nodes ['ns_1@cb-example-0008.cb-example.default.svc'] |
2024-07-04T15:49:00.323Z, ns_orchestrator:0:info:message(ns_1@cb-example-0006.cb-example.default.svc) - Failover completed successfully. |
Rebalance Operation Id = 794728626a4e024557c3d166aef5a38b
|
After this there is series of rebalance of failures
2024-07-04T15:50:00.055Z, ns_orchestrator:0:critical:message(ns_1@cb-example-0006.cb-example.default.svc) - Rebalance exited with reason {{badmatch,failed}, |
[{ns_rebalancer,rebalance_body,7, |
[{file,"src/ns_rebalancer.erl"}, |
{line,500}]}, |
{async,'-async_init/4-fun-1-',3, |
[{file,"src/async.erl"},{line,199}]}]}. |
Rebalance Operation Id = 223c530b7bc62d61ee78e12ea0a8a460
|
...
|
...
|
2024-07-04T15:52:00.066Z, ns_orchestrator:0:critical:message(ns_1@cb-example-0006.cb-example.default.svc) - Rebalance exited with reason {{badmatch,failed}, |
[{ns_rebalancer,rebalance_body,7, |
[{file,"src/ns_rebalancer.erl"}, |
{line,500}]}, |
{async,'-async_init/4-fun-1-',3, |
[{file,"src/async.erl"},{line,199}]}]}. |
Rebalance Operation Id = 4a2e70a85ea938068221399caffd4a9a
|
Eventually rebalance succeeds.
Now, cb-example-0008 is failed over again. Now it comes up with 7.6.1. And is successfully rebalanced.
Logs
2024-07-04A_PostDeltaNodeRestart_cbopinfo-20240705T020256+0530.tar.gz