Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60860

[Upgrade] : Rebalance exited with reason {{badmatch,failed},[{ns_rebalancer,rebalance_body,7,[{file,"src/ns_rebalancer.erl"},{line,500}]},{async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,199}]}]}.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 7.6.0
    • 7.6.0
    • analytics
    • Operating System : Debian GNU/Linux 11 (bullseye)
      Initial Version : Couchbase Enterprise Edition 7.1.0-2556
      Upgrade Version : Couchbase Enterprise Edition 7.6.0-2153

    Description

      Steps to reproduce

      1. Created a 5 node cluster with the following setup on Couchbase Enterprise Edition 7.1.0-2556
        1. 172.23.106.52 - cbas
        2. 172.23.106.53 -  index, kv, n1ql 
        3. 172.23.106.28 - index, kv, n1ql
        4. 172.23.106.38 - cbas 
        5. 172.23.104.221 - cbas
      2. Couchstore bucket "bucket-2" was created with 10000 items
      3. Created a few dataverses, datasets, links and synonyms
      4. 172.23.104.221 was failed over
      5. Couchbase Enterprise Edition 7.6.0-2153 was installed on the node
      6. The node was added back and then attempted a rebalance - Rebalance succeeds
      7. 172.23.106.28 was gracefully failed over 
      8. Couchbase Enterprise Edition 7.6.0-2153 was installed on the node
      9. The node was added back using delta recovery and attempted a rebalance - Rebalance fails

      2024-02-20T02:47:25.553-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.104.221) - Rebalance exited with reason {{badmatch,failed},                              [{ns_rebalancer,rebalance_body,7,                                   [{file,"src/ns_rebalancer.erl"},                                    {line,500}]},                               {async,'-async_init/4-fun-1-',3,                                   [{file,"src/async.erl"},{line,199}]}]}.Rebalance Operation Id = d8c6f1989c995c7721ab2de31a429a3a 

      Observing a few CRASH reports in ns_server.debug.logs

      [error_logger:error,2024-02-20T02:47:25.551-08:00,ns_1@172.23.104.221:service_manager-cbas<0.32055.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: misc:'-spawn_monitor/1-fun-0-'/0    pid: <0.32055.0>    registered_name: 'service_manager-cbas'    exception error: no match of right hand side value                      {error,                         {bad_nodes,cbas,get_agent,                             [{'ns_1@172.23.106.52',timeout}]}}      in function  service_manager:wait_for_agents/1 (src/service_manager.erl, line 165)      in call from service_manager:run_op/1 (src/service_manager.erl, line 140)    ancestors: [<0.32054.0>]    message_queue_len: 0    messages: []    links: []    dictionary: []    trap_exit: false    status: running    heap_size: 2586    stack_size: 28    reductions: 5527  neighbours:
      [ns_server:debug,2024-02-20T02:47:25.552-08:00,ns_1@172.23.104.221:<0.32054.0>:service_janitor:maybe_complete_pending_failover_body:149]Failed to complete service cbas failover: {error,                                           {failover_failed,cbas,                                            {{badmatch,                                              {error,                                               {bad_nodes,cbas,get_agent,                                                [{'ns_1@172.23.106.52',                                                  timeout}]}}},                                             [{service_manager,                                               wait_for_agents,1,                                               [{file,                                                 "src/service_manager.erl"},                                                {line,165}]},                                              {service_manager,run_op,1,                                               [{file,                                                 "src/service_manager.erl"},                                                {line,140}]},                                              {proc_lib,init_p,3,                                               [{file,"proc_lib.erl"},                                                {line,225}]}]}}}[ns_server:info,2024-02-20T02:47:25.552-08:00,ns_1@172.23.104.221:rebalance_agent<0.861.0>:rebalance_agent:handle_down:290]Rebalancer process <0.31611.0> died (reason {{badmatch,failed},                                             [{ns_rebalancer,rebalance_body,                                               7,                                               [{file,"src/ns_rebalancer.erl"},                                                {line,500}]},                                              {async,'-async_init/4-fun-1-',                                               3,                                               [{file,"src/async.erl"},                                                {line,199}]}]}).[ns_server:debug,2024-02-20T02:47:25.552-08:00,ns_1@172.23.104.221:leader_activities<0.805.0>:leader_activities:handle_activity_down:457]Activity terminated with reason {shutdown,                                 {async_died,                                  {raised,                                   {error,                                    {badmatch,failed},                                    [{ns_rebalancer,rebalance_body,7,                                      [{file,"src/ns_rebalancer.erl"},                                       {line,500}]},                                     {async,'-async_init/4-fun-1-',3,                                      [{file,"src/async.erl"},                                       {line,199}]}]}}}}. Activity:{activity,<0.31610.0>,#Ref<0.667242623.859308034.51879>,default,          <<"fa12363556367d7bac493127ed7814a0">>,          [rebalance],          majority,[]}[error_logger:error,2024-02-20T02:47:25.552-08:00,ns_1@172.23.104.221:logger_proxy<0.71.0>:ale_error_logger_handler:do_log:101]Error in process <0.31611.0> on node 'ns_1@172.23.104.221' with exit value:{{badmatch,failed}, [{ns_rebalancer,rebalance_body,7,[{file,"src/ns_rebalancer.erl"},{line,500}]},  {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,199}]}]}
      [error_logger:error,2024-02-20T02:47:25.553-08:00,ns_1@172.23.104.221:logger_proxy<0.71.0>:ale_error_logger_handler:do_log:101]Error in process <0.31609.0> on node 'ns_1@172.23.104.221' with exit value:{{badmatch,failed}, [{ns_rebalancer,rebalance_body,7,[{file,"src/ns_rebalancer.erl"},{line,500}]},  {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,199}]}]}
      [error_logger:error,2024-02-20T02:47:25.553-08:00,ns_1@172.23.104.221:<0.31607.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: erlang:apply/2    pid: <0.31607.0>    registered_name: []    exception error: no match of right hand side value failed      in function  ns_rebalancer:rebalance_body/7 (src/ns_rebalancer.erl, line 500)      in call from async:'-async_init/4-fun-1-'/3 (src/async.erl, line 199)    ancestors: [<0.1414.0>,ns_orchestrator_child_sup,ns_orchestrator_sup,                  mb_master_sup,mb_master,leader_registry_sup,                  leader_services_sup,<0.784.0>,ns_server_sup,                  ns_server_nodes_sup,<0.307.0>,ns_server_cluster_sup,                  root_sup,<0.155.0>]    message_queue_len: 0    messages: []    links: [<0.1414.0>]    dictionary: []    trap_exit: false    status: running    heap_size: 318187    stack_size: 28    reductions: 1005968  neighbours:
      [user:error,2024-02-20T02:47:25.553-08:00,ns_1@172.23.104.221:<0.1414.0>:ns_orchestrator:log_rebalance_completion:1661]Rebalance exited with reason {{badmatch,failed},                              [{ns_rebalancer,rebalance_body,7,                                   [{file,"src/ns_rebalancer.erl"},                                    {line,500}]},                               {async,'-async_init/4-fun-1-',3,                                   [{file,"src/async.erl"},{line,199}]}]}.Rebalance Operation Id = d8c6f1989c995c7721ab2de31a429a3a[ns_server:debug,2024-02-20T02:47:25.554-08:00,ns_1@172.23.104.221:<0.1414.0>:auto_rebalance:retry_rebalance:58]Retry rebalance is not enabled. Failed Rebalance with Id d8c6f1989c995c7721ab2de31a429a3a will not be retried.[ns_server:debug,2024-02-20T02:47:25.581-08:00,ns_1@172.23.104.221:chronicle_kv_log<0.486.0>:chronicle_kv_log:log:59]update (key: counters, rev: {<<"8de2e575e6a7c8485bc574fa34295ecb">>,330})[{rebalance_fail,{1708426045,1}}, {rebalance_start,{1708426029,4}}, {graceful_failover_success,{1708425937,1}}, {failover,{1708425937,2}}, {failover_complete,{1708425936,1}}, {graceful_failover_start,{1708425916,1}}, {failover_success,{1708425794,1}}, {failover_incomplete,{1708425794,1}}, {failover_start,{1708425776,1}}, {rebalance_success,{1708425756,2}}][ns_server:debug,2024-02-20T02:47:25.585-08:00,ns_1@172.23.104.221:ns_config_rep<0.557.0>:ns_config_rep:do_push_keys:385]Replicating some config keys ([rebalance_reports]..)[ns_server:debug,2024-02-20T02:47:25.585-08:00,ns_1@172.23.104.221:ns_config_log<0.301.0>:ns_config_log:log_common:290]config change:rebalance_reports ->[{'_vclock',[{<<"3367ca413abb7153adb2fbed8b8d981e">>,{3,63875644994}},             {<<"8a2c57ed10d26f8a64efa12f893a03b8">>,{3,63875645245}}]}, {<<"4262c16563bc4faa44d119bae4d6d8dd">>,  [{node,'ns_1@172.23.104.221'},   {filename,"rebalance_report_20240220T104725.json"}]}, {<<"7666af4423b0621e64ce91bec4975d99">>,  [{node,'ns_1@172.23.104.221'},   {filename,"rebalance_report_20240220T104537.json"}]}, {<<"7ac7a99bc5f4497bec59df51b43e35aa">>,  [{node,'ns_1@172.23.104.221'},   {filename,"rebalance_report_20240220T104515.json"}]}, {<<"6803f221d6ffff19a031c5791fdeb13e">>,  [{node,'ns_1@172.23.106.28'},   {filename,"rebalance_report_20240220T104314.json"}]}, {<<"556fc9d6874c48c864dca883f8ee0d8e">>,  [{node,'ns_1@172.23.106.28'},   {filename,"rebalance_report_20240220T104236.json"}]}][ns_server:debug,2024-02-20T02:47:25.589-08:00,ns_1@172.23.104.221:<0.724.0>:terse_cluster_info_uploader:handle_info:53]Refreshing terse cluster info with <<"{\"rev\":462,\"nodesExt\":[{\"services\":{\"mgmt\":8091,\"mgmtSSL\":18091},\"thisNode\":true,\"hostname\":\"172.23.104.221\"},{\"services\":{\"capi\":8092,\"capiSSL\":18092,\"kv\":11210,\"kvSSL\":11207,\"mgmt\":8091,\"mgmtSSL\":18091,\"projector\":9999},\"hostname\":\"172.23.106.28\"},{\"services\":{\"cbas\":8095,\"cbasSSL\":18095,\"mgmt\":8091,\"mgmtSSL\":18091},\"hostname\":\"172.23.106.38\"},{\"services\":{\"cbas\":8095,\"cbasSSL\":18095,\"mgmt\":8091,\"mgmtSSL\":18091},\"hostname\":\"172.23.106.52\"},{\"services\":{\"capi\":8092,\"capiSSL\":18092,\"indexAdmin\":9100,\"indexHttp\":9102,\"indexHttps\":19102,\"indexScan\":9101,\"indexStreamCatchup\":9104,\"indexStreamInit\":9103,\"indexStreamMaint\":9105,\"kv\":11210,\"kvSSL\":11207,\"mgmt\":8091,\"mgmtSSL\":18091,\"n1ql\":8093,\"n1qlSSL\":18093,\"projector\":9999},\"hostname\":\"172.23.106.53\"}],\"revEpoch\":1,\"clusterCapabilitiesVer\":[1,0],\"clusterCapabilities\":{\"n1ql\":[\"costBasedOptimizer\",\"indexAdvisor\",\"javaScriptFunctions\",\"inlineFunctions\",\"enhancedPreparedStatements\"]}}">>[ns_server:debug,2024-02-20T02:47:25.610-08:00,ns_1@172.23.104.221:chronicle_kv_log<0.486.0>:chronicle_kv_log:log:59]update (key: rebalance_status, rev: {<<"8de2e575e6a7c8485bc574fa34295ecb">>,                                     331}){none,<<"Rebalance failed. See logs for detailed reason. You can try again.">>}[ns_server:debug,2024-02-20T02:47:25.610-08:00,ns_1@172.23.104.221:chronicle_kv_log<0.486.0>:chronicle_kv_log:log:59]update (key: rebalance_status_uuid, rev: {<<"8de2e575e6a7c8485bc574fa34295ecb">>,                                          331})<<"edc37f1928897df9a2503d21c3f1b6bb">>[ns_server:info,2024-02-20T02:47:25.610-08:00,ns_1@172.23.104.221:leader_registry<0.815.0>:leader_registry:handle_down:286]Process <0.31605.0> registered as 'ns_rebalance_observer' terminated.[ns_server:debug,2024-02-20T02:47:25.610-08:00,ns_1@172.23.104.221:chronicle_kv_log<0.486.0>:chronicle_kv_log:log:59]update (key: rebalancer_pid, rev: {<<"8de2e575e6a7c8485bc574fa34295ecb">>,331})undefined[ns_server:debug,2024-02-20T02:47:25.610-08:00,ns_1@172.23.104.221:chronicle_kv_log<0.486.0>:chronicle_kv_log:log:59]update (key: rebalance_type, rev: {<<"8de2e575e6a7c8485bc574fa34295ecb">>,331})rebalance[ns_server:debug,2024-02-20T02:47:25.621-08:00,ns_1@172.23.104.221:<0.32641.0>:service_janitor:maybe_complete_pending_failover_body:142]Found unfinished failover for service cbas[ns_server:debug,2024-02-20T02:47:25.621-08:00,ns_1@172.23.104.221:service_manager-cbas<0.32642.0>:service_agent:wait_for_agents:74]Waiting for the service agents for service cbas to come up on nodes:['ns_1@172.23.106.38','ns_1@172.23.106.52'][ns_server:error,2024-02-20T02:47:28.095-08:00,ns_1@172.23.104.221:service_manager-cbas<0.32642.0>:service_agent:process_bad_results:990]Service call get_agent (service cbas) failed on some nodes:[{'ns_1@172.23.106.52',     {exit,         {{{case_clause,{error,{unknown_error,<<"failed_to_cancel">>}}},           [{service_agent,cancel_task,2,                [{file,"src/service_agent.erl"},{line,469}]},            {lists,foreach,2,[{file,"lists.erl"},{line,1342}]},            {service_agent,cleanup_service,1,                [{file,"src/service_agent.erl"},{line,497}]},            {service_agent,do_handle_connection,2,                [{file,"src/service_agent.erl"},{line,326}]},            {service_agent,handle_connection,2,                [{file,"src/service_agent.erl"},{line,305}]},            {service_agent,handle_cast,2,                [{file,"src/service_agent.erl"},{line,191}]},            {gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,695}]},            {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,771}]}]},          {gen_server,call,              [{'service_agent-cbas','ns_1@172.23.106.52'},               get_agent,infinity]}}}}][error_logger:error,2024-02-20T02:47:28.096-08:00,ns_1@172.23.104.221:service_manager-cbas<0.32642.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: misc:'-spawn_monitor/1-fun-0-'/0    pid: <0.32642.0>    registered_name: 'service_manager-cbas'    exception error: no match of right hand side value                      {error,                      {bad_nodes,cbas,get_agent,                       [{'ns_1@172.23.106.52',                         {exit,                          {{{case_clause,                             {error,{unknown_error,<<"failed_to_cancel">>}}},                            [{service_agent,cancel_task,2,                              [{file,"src/service_agent.erl"},{line,469}]},                             {lists,foreach,2,                              [{file,"lists.erl"},{line,1342}]},                             {service_agent,cleanup_service,1,                              [{file,"src/service_agent.erl"},{line,497}]},                             {service_agent,do_handle_connection,2,                              [{file,"src/service_agent.erl"},{line,326}]},                             {service_agent,handle_connection,2,                              [{file,"src/service_agent.erl"},{line,305}]},                             {service_agent,handle_cast,2,                              [{file,"src/service_agent.erl"},{line,191}]},                             {gen_server,try_dispatch,4,                              [{file,"gen_server.erl"},{line,695}]},                             {gen_server,handle_msg,6,                              [{file,"gen_server.erl"},{line,771}]}]},                           {gen_server,call,                            [{'service_agent-cbas','ns_1@172.23.106.52'},                             get_agent,infinity]}}}}]}}      in function  service_manager:wait_for_agents/1 (src/service_manager.erl, line 165)      in call from service_manager:run_op/1 (src/service_manager.erl, line 140)    ancestors: [<0.32641.0>]    message_queue_len: 0    messages: []    links: []    dictionary: []    trap_exit: false    status: running    heap_size: 6772    stack_size: 28    reductions: 12597  neighbours:
      [ns_server:debug,2024-02-20T02:47:28.096-08:00,ns_1@172.23.104.221:<0.32641.0>:service_janitor:maybe_complete_pending_failover_body:149]Failed to complete service cbas failover: {error,                                           {failover_failed,cbas,                                            {{badmatch,                                              {error,                                               {bad_nodes,cbas,get_agent,                                                [{'ns_1@172.23.106.52',                                                  {exit,                                                   {{{case_clause,                                                      {error,                                                       {unknown_error,                                                        <<"failed_to_cancel">>}}},                                                     [{service_agent,                                                       cancel_task,2,                                                       [{file,                                                         "src/service_agent.erl"},                                                        {line,469}]},                                                      {lists,foreach,2,                                                       [{file,"lists.erl"},                                                        {line,1342}]},                                                      {service_agent,                                                       cleanup_service,1,                                                       [{file,                                                         "src/service_agent.erl"},                                                        {line,497}]},                                                      {service_agent,                                                       do_handle_connection,2,                                                       [{file,                                                         "src/service_agent.erl"},                                                        {line,326}]},                                                      {service_agent,                                                       handle_connection,2,                                                       [{file,                                                         "src/service_agent.erl"},                                                        {line,305}]},                                                      {service_agent,                                                       handle_cast,2,                                                       [{file,                                                         "src/service_agent.erl"},                                                        {line,191}]},                                                      {gen_server,                                                       try_dispatch,4,                                                       [{file,                                                         "gen_server.erl"},                                                        {line,695}]},                                                      {gen_server,handle_msg,                                                       6,                                                       [{file,                                                         "gen_server.erl"},                                                        {line,771}]}]},                                                    {gen_server,call,                                                     [{'service_agent-cbas',                                                       'ns_1@172.23.106.52'},                                                      get_agent,                                                      infinity]}}}}]}}},                                             [{service_manager,                                               wait_for_agents,1,                                               [{file,                                                 "src/service_manager.erl"},                                                {line,165}]},                                              {service_manager,run_op,1,                                               [{file,                                                 "src/service_manager.erl"},                                                {line,140}]},                                              {proc_lib,init_p,3,                                               [{file,"proc_lib.erl"},                                                {line,225}]}]}}} 

      The stacktrace matches the exact stacktrace found on MB-60743. The rebalance there failed for cbas and fails for kv,index,n1ql node here. Could be related


       

      TAF Script to reproduce


      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /data/workspace/debian-p0-analytics-vset00-00-analytics_upgrade_with_failover_from_7.1.0_with_collections/testexec.7143.ini -p GROUP=7_1_0;failover_upgrade,kv_quota_percent=70,bucket_storage=couchstore,key=test_collections,get-cbcollect-info=True,upgrade_version=7.6.0-2153,aws_access_key=AKIAXQQ2DIGA2VADROME,aws_secret_key=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,sirius_url=http://172.23.120.103:4000 -t upgrade.cbas_upgrade.UpgradeTests.test_upgrade_with_failover,upgrade_chain=7.1.0,upgrade_type=failover_delta_recovery,update_nodes=kv;cbas,nodes_init=5,services_init=kv:index:n1ql-kv:index:n1ql-cbas-cbas-cbas,pre_update_no_of_dv=2,pre_update_ds_per_dv=4,pre_update_no_of_synonym=5,pre_update_no_of_index=3,replica_num=3,override_spec_params=num_buckets;num_scopes;num_collections;replicas;num_items,num_items=10000,num_buckets=3,num_scopes=5,num_collections=5,no_of_dv=10,ds_per_dv=3,no_of_synonym=10,no_of_index=5,GROUP=7_1_0;failover_upgrade,cbas_cc_node_upgrade_sequence=first'

      Job name :debian-analytics-analytics_upgrade_with_failover_from_7.1.0_with_collections

      Job ref : http://cb-logs-qe.s3-website-us-west-2.amazonaws.com/7.6.0-2153/jenkins_logs/test_suite_executor-TAF/313465/

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              raghav.sk Raghav S K
              raghav.sk Raghav S K
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty