Details
-
Bug
-
Resolution: Unresolved
-
Critical
-
7.6.0
-
Operating System : Debian GNU/Linux 11 (bullseye)
Initial Version : Couchbase Enterprise Edition 7.1.0-2556
Upgrade Version : Couchbase Enterprise Edition 7.6.0-2107
-
Untriaged
-
Linux x86_64
-
-
0
-
Unknown
Description
Steps to reproduce
- Created a 5 node cluster with the following setup on Couchbase Enterprise Edition 7.1.0-2556
- 172.23.121.27 - cbas
- 172.23.121.208 - index, kv, n1ql
- 172.23.123.44 - index, kv, n1ql
- 172.23.107.26 - cbas
- 172.23.122.107 - cbas
- Couchstore bucket "bucket-5" was created with 10000 items
- Created a few dataverses, datasets, links and synonyms
- 172.23.107.26 was failed over
- Couchbase Enterprise Edition 7.6.0-2107 was installed on the node
- The node was added back and then attempted a rebalance
Rebalance fails
2024-02-08T22:49:59.264-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.107.26) - Rebalance exited with reason {{badmatch,failed}, [{ns_rebalancer,rebalance_body,7, [{file,"src/ns_rebalancer.erl"}, {line,500}]}, {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"},{line,199}]}]}.Rebalance Operation Id = f76804fa78a68eecc2693126805e3344 |
Observing bad_nodes, cbas in ns_server.debug.logs
[ns_server:error,2024-02-08T22:49:59.253-08:00,ns_1@172.23.107.26:service_manager-cbas<0.1560.0>:service_agent:process_bad_results:990]Service call get_agent (service cbas) failed on some nodes:[{'ns_1@172.23.121.27',timeout}][error_logger:error,2024-02-08T22:49:59.261-08:00,ns_1@172.23.107.26:service_manager-cbas<0.1560.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: misc:'-spawn_monitor/1-fun-0-'/0 pid: <0.1560.0> registered_name: 'service_manager-cbas' exception error: no match of right hand side value {error, {bad_nodes,cbas,get_agent, [{'ns_1@172.23.121.27',timeout}]}} in function service_manager:wait_for_agents/1 (src/service_manager.erl, line 165) in call from service_manager:run_op/1 (src/service_manager.erl, line 140) ancestors: [<0.1559.0>] message_queue_len: 0 messages: [] links: [] dictionary: [] trap_exit: false status: running heap_size: 2586 stack_size: 28 reductions: 5554 neighbours: |
[ns_server:debug,2024-02-08T22:49:59.261-08:00,ns_1@172.23.107.26:<0.1559.0>:service_janitor:maybe_complete_pending_failover_body:149]Failed to complete service cbas failover: {error, {failover_failed,cbas, {{badmatch, {error, {bad_nodes,cbas,get_agent, [{'ns_1@172.23.121.27', timeout}]}}}, [{service_manager, wait_for_agents,1, [{file, "src/service_manager.erl"}, {line,165}]}, {service_manager,run_op,1, [{file, "src/service_manager.erl"}, {line,140}]}, {proc_lib,init_p,3, [{file,"proc_lib.erl"}, {line,225}]}]}}}[error_logger:error,2024-02-08T22:49:59.262-08:00,ns_1@172.23.107.26:logger_proxy<0.71.0>:ale_error_logger_handler:do_log:101]Error in process <0.1492.0> on node 'ns_1@172.23.107.26' with exit value:{{badmatch,failed}, [{ns_rebalancer,rebalance_body,7,[{file,"src/ns_rebalancer.erl"},{line,500}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,199}]}]} |
[ns_server:info,2024-02-08T22:49:59.262-08:00,ns_1@172.23.107.26:rebalance_agent<0.850.0>:rebalance_agent:handle_down:290]Rebalancer process <0.1492.0> died (reason {{badmatch,failed}, [{ns_rebalancer,rebalance_body,7, [{file,"src/ns_rebalancer.erl"}, {line,500}]}, {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"}, {line,199}]}]}).[error_logger:error,2024-02-08T22:49:59.262-08:00,ns_1@172.23.107.26:logger_proxy<0.71.0>:ale_error_logger_handler:do_log:101]Error in process <0.1490.0> on node 'ns_1@172.23.107.26' with exit value:{{badmatch,failed}, [{ns_rebalancer,rebalance_body,7,[{file,"src/ns_rebalancer.erl"},{line,500}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,199}]}]} |
[ns_server:debug,2024-02-08T22:49:59.262-08:00,ns_1@172.23.107.26:leader_activities<0.791.0>:leader_activities:handle_activity_down:457]Activity terminated with reason {shutdown, {async_died, {raised, {error, {badmatch,failed}, [{ns_rebalancer,rebalance_body,7, [{file,"src/ns_rebalancer.erl"}, {line,500}]}, {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"}, {line,199}]}]}}}}. Activity:{activity,<0.1491.0>,#Ref<0.4221202476.1589641217.115197>,default, <<"a449faee6282c2daf4fc1cb52bbcdf98">>, [rebalance], majority,[]}[error_logger:error,2024-02-08T22:49:59.263-08:00,ns_1@172.23.107.26:<0.1488.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: erlang:apply/2 pid: <0.1488.0> registered_name: [] exception error: no match of right hand side value failed in function ns_rebalancer:rebalance_body/7 (src/ns_rebalancer.erl, line 500) in call from async:'-async_init/4-fun-1-'/3 (src/async.erl, line 199) ancestors: [<0.1400.0>,ns_orchestrator_child_sup,ns_orchestrator_sup, mb_master_sup,mb_master,leader_registry_sup, leader_services_sup,<0.788.0>,ns_server_sup, ns_server_nodes_sup,<0.301.0>,ns_server_cluster_sup, root_sup,<0.155.0>] message_queue_len: 0 messages: [] links: [<0.1400.0>] dictionary: [] trap_exit: false status: running heap_size: 17731 stack_size: 28 reductions: 3358 neighbours: |
[user:error,2024-02-08T22:49:59.264-08:00,ns_1@172.23.107.26:<0.1400.0>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {{badmatch,failed}, [{ns_rebalancer,rebalance_body,7, [{file,"src/ns_rebalancer.erl"}, {line,500}]}, {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"},{line,199}]}]}.Rebalance Operation Id = f76804fa78a68eecc2693126805e3344 |
Re-tries of the same rebalance also fail
TAF Script to reproduce
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /data/workspace/debian-p0-analytics-vset00-00-analytics_upgrade_from_7.1.0_with_collections/testexec.20372.ini -p GROUP=7_1_0,kv_quota_percent=70,bucket_storage=couchstore,key=test_collections,get-cbcollect-info=True,upgrade_version=7.6.0-2107,aws_access_key=AKIAXQQ2DIGA2VADROME,aws_secret_key=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,sirius_url=http://172.23.120.103:4000 -t upgrade.cbas_upgrade.UpgradeTests.test_upgrade_with_failover,upgrade_chain=7.1.0,upgrade_type=failover_delta_recovery,update_nodes=kv;cbas,nodes_init=5,services_init=kv:index:n1ql-kv:index:n1ql-cbas-cbas-cbas,pre_update_no_of_dv=2,pre_update_ds_per_dv=4,pre_update_no_of_synonym=5,pre_update_no_of_index=3,replica_num=3,override_spec_params=num_buckets;num_scopes;num_collections;replicas;num_items,num_items=10000,num_buckets=3,num_scopes=5,num_collections=5,no_of_dv=10,ds_per_dv=3,no_of_synonym=10,no_of_index=5,GROUP=7_1_0' |
Job name : debian-analytics_upgrade_from_7.1.0_with_collections
Job ref : http://qa.sc.couchbase.com/job/test_suite_executor-TAF/309910/console
Attachments
Issue Links
- is duplicated by
-
MB-60860 [Upgrade] : Rebalance exited with reason {{badmatch,failed},[{ns_rebalancer,rebalance_body,7,[{file,"src/ns_rebalancer.erl"},{line,500}]},{async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,199}]}]}.
- Closed