Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
7.6.0
-
Operating System : Debian GNU/Linux 11 (bullseye)
Initial Version : Couchbase Enterprise Edition 7.1.0-2556
Upgrade Version : Couchbase Enterprise Edition 7.6.0-2153
-
Untriaged
-
Linux x86_64
-
-
0
-
Unknown
Description
Steps to reproduce
- Created a 5 node cluster with the following setup on Couchbase Enterprise Edition 7.1.0-2556
- 172.23.106.52 - cbas
- 172.23.106.53 - index, kv, n1ql
- 172.23.106.28 - index, kv, n1ql
- 172.23.106.38 - cbas
- 172.23.104.221 - cbas
- Couchstore bucket "bucket-2" was created with 10000 items
- Created a few dataverses, datasets, links and synonyms
- 172.23.104.221 was failed over
- Couchbase Enterprise Edition 7.6.0-2153 was installed on the node
- The node was added back and then attempted a rebalance - Rebalance succeeds
- 172.23.106.28 was gracefully failed over
- Couchbase Enterprise Edition 7.6.0-2153 was installed on the node
- The node was added back using delta recovery and attempted a rebalance - Rebalance fails
2024-02-20T02:47:25.553-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.221) - Rebalance exited with reason {{badmatch,failed}, [{ns_rebalancer,rebalance_body,7, [{file,"src/ns_rebalancer.erl"}, {line,500}]}, {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"},{line,199}]}]}.Rebalance Operation Id = d8c6f1989c995c7721ab2de31a429a3a |
Observing a few CRASH reports in ns_server.debug.logs
[error_logger:error,2024-02-20T02:47:25.551-08:00,ns_1@172.23.104.221:service_manager-cbas<0.32055.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: misc:'-spawn_monitor/1-fun-0-'/0 pid: <0.32055.0> registered_name: 'service_manager-cbas' exception error: no match of right hand side value {error, {bad_nodes,cbas,get_agent, [{'ns_1@172.23.106.52',timeout}]}} in function service_manager:wait_for_agents/1 (src/service_manager.erl, line 165) in call from service_manager:run_op/1 (src/service_manager.erl, line 140) ancestors: [<0.32054.0>] message_queue_len: 0 messages: [] links: [] dictionary: [] trap_exit: false status: running heap_size: 2586 stack_size: 28 reductions: 5527 neighbours: |
[ns_server:debug,2024-02-20T02:47:25.552-08:00,ns_1@172.23.104.221:<0.32054.0>:service_janitor:maybe_complete_pending_failover_body:149]Failed to complete service cbas failover: {error, {failover_failed,cbas, {{badmatch, {error, {bad_nodes,cbas,get_agent, [{'ns_1@172.23.106.52', timeout}]}}}, [{service_manager, wait_for_agents,1, [{file, "src/service_manager.erl"}, {line,165}]}, {service_manager,run_op,1, [{file, "src/service_manager.erl"}, {line,140}]}, {proc_lib,init_p,3, [{file,"proc_lib.erl"}, {line,225}]}]}}}[ns_server:info,2024-02-20T02:47:25.552-08:00,ns_1@172.23.104.221:rebalance_agent<0.861.0>:rebalance_agent:handle_down:290]Rebalancer process <0.31611.0> died (reason {{badmatch,failed}, [{ns_rebalancer,rebalance_body, 7, [{file,"src/ns_rebalancer.erl"}, {line,500}]}, {async,'-async_init/4-fun-1-', 3, [{file,"src/async.erl"}, {line,199}]}]}).[ns_server:debug,2024-02-20T02:47:25.552-08:00,ns_1@172.23.104.221:leader_activities<0.805.0>:leader_activities:handle_activity_down:457]Activity terminated with reason {shutdown, {async_died, {raised, {error, {badmatch,failed}, [{ns_rebalancer,rebalance_body,7, [{file,"src/ns_rebalancer.erl"}, {line,500}]}, {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"}, {line,199}]}]}}}}. Activity:{activity,<0.31610.0>,#Ref<0.667242623.859308034.51879>,default, <<"fa12363556367d7bac493127ed7814a0">>, [rebalance], majority,[]}[error_logger:error,2024-02-20T02:47:25.552-08:00,ns_1@172.23.104.221:logger_proxy<0.71.0>:ale_error_logger_handler:do_log:101]Error in process <0.31611.0> on node 'ns_1@172.23.104.221' with exit value:{{badmatch,failed}, [{ns_rebalancer,rebalance_body,7,[{file,"src/ns_rebalancer.erl"},{line,500}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,199}]}]} |
[error_logger:error,2024-02-20T02:47:25.553-08:00,ns_1@172.23.104.221:logger_proxy<0.71.0>:ale_error_logger_handler:do_log:101]Error in process <0.31609.0> on node 'ns_1@172.23.104.221' with exit value:{{badmatch,failed}, [{ns_rebalancer,rebalance_body,7,[{file,"src/ns_rebalancer.erl"},{line,500}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,199}]}]} |
[error_logger:error,2024-02-20T02:47:25.553-08:00,ns_1@172.23.104.221:<0.31607.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: erlang:apply/2 pid: <0.31607.0> registered_name: [] exception error: no match of right hand side value failed in function ns_rebalancer:rebalance_body/7 (src/ns_rebalancer.erl, line 500) in call from async:'-async_init/4-fun-1-'/3 (src/async.erl, line 199) ancestors: [<0.1414.0>,ns_orchestrator_child_sup,ns_orchestrator_sup, mb_master_sup,mb_master,leader_registry_sup, leader_services_sup,<0.784.0>,ns_server_sup, ns_server_nodes_sup,<0.307.0>,ns_server_cluster_sup, root_sup,<0.155.0>] message_queue_len: 0 messages: [] links: [<0.1414.0>] dictionary: [] trap_exit: false status: running heap_size: 318187 stack_size: 28 reductions: 1005968 neighbours: |
[user:error,2024-02-20T02:47:25.553-08:00,ns_1@172.23.104.221:<0.1414.0>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {{badmatch,failed}, [{ns_rebalancer,rebalance_body,7, [{file,"src/ns_rebalancer.erl"}, {line,500}]}, {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"},{line,199}]}]}.Rebalance Operation Id = d8c6f1989c995c7721ab2de31a429a3a[ns_server:debug,2024-02-20T02:47:25.554-08:00,ns_1@172.23.104.221:<0.1414.0>:auto_rebalance:retry_rebalance:58]Retry rebalance is not enabled. Failed Rebalance with Id d8c6f1989c995c7721ab2de31a429a3a will not be retried.[ns_server:debug,2024-02-20T02:47:25.581-08:00,ns_1@172.23.104.221:chronicle_kv_log<0.486.0>:chronicle_kv_log:log:59]update (key: counters, rev: {<<"8de2e575e6a7c8485bc574fa34295ecb">>,330})[{rebalance_fail,{1708426045,1}}, {rebalance_start,{1708426029,4}}, {graceful_failover_success,{1708425937,1}}, {failover,{1708425937,2}}, {failover_complete,{1708425936,1}}, {graceful_failover_start,{1708425916,1}}, {failover_success,{1708425794,1}}, {failover_incomplete,{1708425794,1}}, {failover_start,{1708425776,1}}, {rebalance_success,{1708425756,2}}][ns_server:debug,2024-02-20T02:47:25.585-08:00,ns_1@172.23.104.221:ns_config_rep<0.557.0>:ns_config_rep:do_push_keys:385]Replicating some config keys ([rebalance_reports]..)[ns_server:debug,2024-02-20T02:47:25.585-08:00,ns_1@172.23.104.221:ns_config_log<0.301.0>:ns_config_log:log_common:290]config change:rebalance_reports ->[{'_vclock',[{<<"3367ca413abb7153adb2fbed8b8d981e">>,{3,63875644994}}, {<<"8a2c57ed10d26f8a64efa12f893a03b8">>,{3,63875645245}}]}, {<<"4262c16563bc4faa44d119bae4d6d8dd">>, [{node,'ns_1@172.23.104.221'}, {filename,"rebalance_report_20240220T104725.json"}]}, {<<"7666af4423b0621e64ce91bec4975d99">>, [{node,'ns_1@172.23.104.221'}, {filename,"rebalance_report_20240220T104537.json"}]}, {<<"7ac7a99bc5f4497bec59df51b43e35aa">>, [{node,'ns_1@172.23.104.221'}, {filename,"rebalance_report_20240220T104515.json"}]}, {<<"6803f221d6ffff19a031c5791fdeb13e">>, [{node,'ns_1@172.23.106.28'}, {filename,"rebalance_report_20240220T104314.json"}]}, {<<"556fc9d6874c48c864dca883f8ee0d8e">>, [{node,'ns_1@172.23.106.28'}, {filename,"rebalance_report_20240220T104236.json"}]}][ns_server:debug,2024-02-20T02:47:25.589-08:00,ns_1@172.23.104.221:<0.724.0>:terse_cluster_info_uploader:handle_info:53]Refreshing terse cluster info with <<"{\"rev\":462,\"nodesExt\":[{\"services\":{\"mgmt\":8091,\"mgmtSSL\":18091},\"thisNode\":true,\"hostname\":\"172.23.104.221\"},{\"services\":{\"capi\":8092,\"capiSSL\":18092,\"kv\":11210,\"kvSSL\":11207,\"mgmt\":8091,\"mgmtSSL\":18091,\"projector\":9999},\"hostname\":\"172.23.106.28\"},{\"services\":{\"cbas\":8095,\"cbasSSL\":18095,\"mgmt\":8091,\"mgmtSSL\":18091},\"hostname\":\"172.23.106.38\"},{\"services\":{\"cbas\":8095,\"cbasSSL\":18095,\"mgmt\":8091,\"mgmtSSL\":18091},\"hostname\":\"172.23.106.52\"},{\"services\":{\"capi\":8092,\"capiSSL\":18092,\"indexAdmin\":9100,\"indexHttp\":9102,\"indexHttps\":19102,\"indexScan\":9101,\"indexStreamCatchup\":9104,\"indexStreamInit\":9103,\"indexStreamMaint\":9105,\"kv\":11210,\"kvSSL\":11207,\"mgmt\":8091,\"mgmtSSL\":18091,\"n1ql\":8093,\"n1qlSSL\":18093,\"projector\":9999},\"hostname\":\"172.23.106.53\"}],\"revEpoch\":1,\"clusterCapabilitiesVer\":[1,0],\"clusterCapabilities\":{\"n1ql\":[\"costBasedOptimizer\",\"indexAdvisor\",\"javaScriptFunctions\",\"inlineFunctions\",\"enhancedPreparedStatements\"]}}">>[ns_server:debug,2024-02-20T02:47:25.610-08:00,ns_1@172.23.104.221:chronicle_kv_log<0.486.0>:chronicle_kv_log:log:59]update (key: rebalance_status, rev: {<<"8de2e575e6a7c8485bc574fa34295ecb">>, 331}){none,<<"Rebalance failed. See logs for detailed reason. You can try again.">>}[ns_server:debug,2024-02-20T02:47:25.610-08:00,ns_1@172.23.104.221:chronicle_kv_log<0.486.0>:chronicle_kv_log:log:59]update (key: rebalance_status_uuid, rev: {<<"8de2e575e6a7c8485bc574fa34295ecb">>, 331})<<"edc37f1928897df9a2503d21c3f1b6bb">>[ns_server:info,2024-02-20T02:47:25.610-08:00,ns_1@172.23.104.221:leader_registry<0.815.0>:leader_registry:handle_down:286]Process <0.31605.0> registered as 'ns_rebalance_observer' terminated.[ns_server:debug,2024-02-20T02:47:25.610-08:00,ns_1@172.23.104.221:chronicle_kv_log<0.486.0>:chronicle_kv_log:log:59]update (key: rebalancer_pid, rev: {<<"8de2e575e6a7c8485bc574fa34295ecb">>,331})undefined[ns_server:debug,2024-02-20T02:47:25.610-08:00,ns_1@172.23.104.221:chronicle_kv_log<0.486.0>:chronicle_kv_log:log:59]update (key: rebalance_type, rev: {<<"8de2e575e6a7c8485bc574fa34295ecb">>,331})rebalance[ns_server:debug,2024-02-20T02:47:25.621-08:00,ns_1@172.23.104.221:<0.32641.0>:service_janitor:maybe_complete_pending_failover_body:142]Found unfinished failover for service cbas[ns_server:debug,2024-02-20T02:47:25.621-08:00,ns_1@172.23.104.221:service_manager-cbas<0.32642.0>:service_agent:wait_for_agents:74]Waiting for the service agents for service cbas to come up on nodes:['ns_1@172.23.106.38','ns_1@172.23.106.52'][ns_server:error,2024-02-20T02:47:28.095-08:00,ns_1@172.23.104.221:service_manager-cbas<0.32642.0>:service_agent:process_bad_results:990]Service call get_agent (service cbas) failed on some nodes:[{'ns_1@172.23.106.52', {exit, {{{case_clause,{error,{unknown_error,<<"failed_to_cancel">>}}}, [{service_agent,cancel_task,2, [{file,"src/service_agent.erl"},{line,469}]}, {lists,foreach,2,[{file,"lists.erl"},{line,1342}]}, {service_agent,cleanup_service,1, [{file,"src/service_agent.erl"},{line,497}]}, {service_agent,do_handle_connection,2, [{file,"src/service_agent.erl"},{line,326}]}, {service_agent,handle_connection,2, [{file,"src/service_agent.erl"},{line,305}]}, {service_agent,handle_cast,2, [{file,"src/service_agent.erl"},{line,191}]}, {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,695}]}, {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,771}]}]}, {gen_server,call, [{'service_agent-cbas','ns_1@172.23.106.52'}, get_agent,infinity]}}}}][error_logger:error,2024-02-20T02:47:28.096-08:00,ns_1@172.23.104.221:service_manager-cbas<0.32642.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: misc:'-spawn_monitor/1-fun-0-'/0 pid: <0.32642.0> registered_name: 'service_manager-cbas' exception error: no match of right hand side value {error, {bad_nodes,cbas,get_agent, [{'ns_1@172.23.106.52', {exit, {{{case_clause, {error,{unknown_error,<<"failed_to_cancel">>}}}, [{service_agent,cancel_task,2, [{file,"src/service_agent.erl"},{line,469}]}, {lists,foreach,2, [{file,"lists.erl"},{line,1342}]}, {service_agent,cleanup_service,1, [{file,"src/service_agent.erl"},{line,497}]}, {service_agent,do_handle_connection,2, [{file,"src/service_agent.erl"},{line,326}]}, {service_agent,handle_connection,2, [{file,"src/service_agent.erl"},{line,305}]}, {service_agent,handle_cast,2, [{file,"src/service_agent.erl"},{line,191}]}, {gen_server,try_dispatch,4, [{file,"gen_server.erl"},{line,695}]}, {gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,771}]}]}, {gen_server,call, [{'service_agent-cbas','ns_1@172.23.106.52'}, get_agent,infinity]}}}}]}} in function service_manager:wait_for_agents/1 (src/service_manager.erl, line 165) in call from service_manager:run_op/1 (src/service_manager.erl, line 140) ancestors: [<0.32641.0>] message_queue_len: 0 messages: [] links: [] dictionary: [] trap_exit: false status: running heap_size: 6772 stack_size: 28 reductions: 12597 neighbours: |
[ns_server:debug,2024-02-20T02:47:28.096-08:00,ns_1@172.23.104.221:<0.32641.0>:service_janitor:maybe_complete_pending_failover_body:149]Failed to complete service cbas failover: {error, {failover_failed,cbas, {{badmatch, {error, {bad_nodes,cbas,get_agent, [{'ns_1@172.23.106.52', {exit, {{{case_clause, {error, {unknown_error, <<"failed_to_cancel">>}}}, [{service_agent, cancel_task,2, [{file, "src/service_agent.erl"}, {line,469}]}, {lists,foreach,2, [{file,"lists.erl"}, {line,1342}]}, {service_agent, cleanup_service,1, [{file, "src/service_agent.erl"}, {line,497}]}, {service_agent, do_handle_connection,2, [{file, "src/service_agent.erl"}, {line,326}]}, {service_agent, handle_connection,2, [{file, "src/service_agent.erl"}, {line,305}]}, {service_agent, handle_cast,2, [{file, "src/service_agent.erl"}, {line,191}]}, {gen_server, try_dispatch,4, [{file, "gen_server.erl"}, {line,695}]}, {gen_server,handle_msg, 6, [{file, "gen_server.erl"}, {line,771}]}]}, {gen_server,call, [{'service_agent-cbas', 'ns_1@172.23.106.52'}, get_agent, infinity]}}}}]}}}, [{service_manager, wait_for_agents,1, [{file, "src/service_manager.erl"}, {line,165}]}, {service_manager,run_op,1, [{file, "src/service_manager.erl"}, {line,140}]}, {proc_lib,init_p,3, [{file,"proc_lib.erl"}, {line,225}]}]}}} |
The stacktrace matches the exact stacktrace found on MB-60743. The rebalance there failed for cbas and fails for kv,index,n1ql node here. Could be related
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /data/workspace/debian-p0-analytics-vset00-00-analytics_upgrade_with_failover_from_7.1.0_with_collections/testexec.7143.ini -p GROUP=7_1_0;failover_upgrade,kv_quota_percent=70,bucket_storage=couchstore,key=test_collections,get-cbcollect-info=True,upgrade_version=7.6.0-2153,aws_access_key=AKIAXQQ2DIGA2VADROME,aws_secret_key=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,sirius_url=http://172.23.120.103:4000 -t upgrade.cbas_upgrade.UpgradeTests.test_upgrade_with_failover,upgrade_chain=7.1.0,upgrade_type=failover_delta_recovery,update_nodes=kv;cbas,nodes_init=5,services_init=kv:index:n1ql-kv:index:n1ql-cbas-cbas-cbas,pre_update_no_of_dv=2,pre_update_ds_per_dv=4,pre_update_no_of_synonym=5,pre_update_no_of_index=3,replica_num=3,override_spec_params=num_buckets;num_scopes;num_collections;replicas;num_items,num_items=10000,num_buckets=3,num_scopes=5,num_collections=5,no_of_dv=10,ds_per_dv=3,no_of_synonym=10,no_of_index=5,GROUP=7_1_0;failover_upgrade,cbas_cc_node_upgrade_sequence=first'
Job name :debian-analytics-analytics_upgrade_with_failover_from_7.1.0_with_collections
Attachments
Issue Links
- duplicates
-
MB-60743 [Upgrade] : Rebalance exited with reason {{badmatch,failed},[{ns_rebalancer,rebalance_body,7,[{file,"src/ns_rebalancer.erl"},{line,500}]},{async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,199}]}]}.
- Open