Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60018

[Rebalance][ipv6] : Rebalance exited with reason {service_rebalance_failed,backup,{{badmatch,{error,{bad_nodes,backup,set_service_manager,[{'ns_1@s10501-ip6.qe.couchbase.com',{exit,{{linked_process_died

    XMLWordPrintable

Details

    Description

      Steps to repro

      1. Create a single node cluster on ipv6 with services backup, fts, index, kv, n1ql
      2. Added a second IPV6 node with same set of services

      Rebalance fails

      2023-12-05T06:07:33.632-08:00, ns_orchestrator:0:critical:message(ns_1@s10505-ip6.qe.couchbase.com) - Rebalance exited with reason {service_rebalance_failed,backup,                              {{badmatch,                                {error,                                 {bad_nodes,backup,set_service_manager,                                  [{'ns_1@s10501-ip6.qe.couchbase.com',                                    {exit,                                     {{linked_process_died,<35160.3685.0>,                                       {'ns_1@s10501-ip6.qe.couchbase.com',                                        {no_connection,"backup-service_api"}}},                                      {gen_server,call,                                       [{'service_agent-backup',                                         'ns_1@s10501-ip6.qe.couchbase.com'},                                        {set_service_manager,<0.3996.0>},                                        infinity]}}}}]}}},                               [{service_manager,set_service_manager,1,                                 [{file,"src/service_manager.erl"},                                  {line,188}]},                                {service_manager,run_op,1,                                 [{file,"src/service_manager.erl"},                                  {line,146}]},                                {proc_lib,init_p,3,                                 [{file,"proc_lib.erl"},{line,225}]}]}}.Rebalance Operation Id = e0896f4384930db698c12d2f5b1a20c2 

      Observing many backup service shutdown and restarts in ns_server.debug.log

      [ns_server:debug,2023-12-05T06:07:22.400-08:00,ns_1@s10501-ip6.qe.couchbase.com:json_rpc_connection-backup-cbauth<0.5742.0>:json_rpc_connection:init:71]Observed revrpc connection: label "backup-cbauth", handling process <0.5742.0>[ns_server:debug,2023-12-05T06:07:22.400-08:00,ns_1@s10501-ip6.qe.couchbase.com:menelaus_cbauth<0.591.0>:menelaus_cbauth:handle_cast:201]Observed json rpc process {"backup-cbauth",[{internal,true}],<0.5742.0>} started[ns_server:debug,2023-12-05T06:07:29.907-08:00,ns_1@s10501-ip6.qe.couchbase.com:json_rpc_connection-backup-cbauth<0.5742.0>:json_rpc_connection:handle_info:142]Socket closed[ns_server:debug,2023-12-05T06:07:29.908-08:00,ns_1@s10501-ip6.qe.couchbase.com:<0.5744.0>:ns_pubsub:do_subscribe_link_continue:150]Parent process of subscription {chronicle_compat_event_manager,<0.5742.0>} exited with reason shutdown[ns_server:debug,2023-12-05T06:07:29.909-08:00,ns_1@s10501-ip6.qe.couchbase.com:menelaus_cbauth<0.591.0>:menelaus_cbauth:handle_info:254]Observed json rpc process {rpc_process,"backup-cbauth",internal,                              #Ref<0.641258218.2764832771.222624>,undefined,                              -576460632908} died with reason shutdown[user:info,2023-12-05T06:07:29.909-08:00,ns_1@s10501-ip6.qe.couchbase.com:<0.431.0>:ns_log:consume_log:76]Service 'backup' exited with status 1. Restarting. Messages:2023-12-05T06:07:22.404-08:00 DEBUG (REST) (Attempt 1) (GET) (200) Received response from 'http://[::1]:8091/pools/default/nodeServices'2023-12-05T06:07:24.905-08:00 WARN (REST) (Attempt 8891712) Failed to get credentials due to error: address ::1:8091: too many colons in address2023-12-05T06:07:27.406-08:00 WARN (REST) (Attempt 8891712) Failed to get credentials due to error: address ::1:8091: too many colons in address2023-12-05T06:07:29.906-08:00 WARN (REST) (Attempt 1) (GET) Request to endpoint '/pools' failed due to error: failed to prepare request: failed to set auth headers: exhausted retry count after 3 attempts: address ::1:8091: too many colons in address2023-12-05T06:07:29.906-08:00 ERROR (Main) Failed to run node {"err": "could not create REST client: failed to get cluster information: failed to get cluster metadata: failed to execute request: failed to execute request: exhausted retry count after 3 retries, last error: failed to set auth headers: exhausted retry count after 3 attempts: address ::1:8091: too many colons in address"}
       

      Observing CRASH reports for backup at the time of failure in ns_server.debug.log

      [ns_server:debug,2023-12-05T06:07:30.141-08:00,ns_1@s10501-ip6.qe.couchbase.com:json_rpc_connection-backup-cbauth<0.6058.0>:json_rpc_connection:init:71]Observed revrpc connection: label "backup-cbauth", handling process <0.6058.0>[ns_server:debug,2023-12-05T06:07:30.141-08:00,ns_1@s10501-ip6.qe.couchbase.com:menelaus_cbauth<0.591.0>:menelaus_cbauth:handle_cast:201]Observed json rpc process {"backup-cbauth",[{internal,true}],<0.6058.0>} started[ns_server:debug,2023-12-05T06:07:32.225-08:00,ns_1@s10501-ip6.qe.couchbase.com:compaction_daemon<0.680.0>:compaction_daemon:process_scheduler_message:1316]No buckets to compact for compact_kv. Rescheduling compaction.[ns_server:debug,2023-12-05T06:07:32.225-08:00,ns_1@s10501-ip6.qe.couchbase.com:compaction_daemon<0.680.0>:compaction_scheduler:schedule_next:51]Finished compaction for compact_kv too soon. Next run will be in 30s[ns_server:debug,2023-12-05T06:07:32.225-08:00,ns_1@s10501-ip6.qe.couchbase.com:compaction_daemon<0.680.0>:compaction_daemon:process_scheduler_message:1316]No buckets to compact for compact_views. Rescheduling compaction.[ns_server:debug,2023-12-05T06:07:32.225-08:00,ns_1@s10501-ip6.qe.couchbase.com:compaction_daemon<0.680.0>:compaction_scheduler:schedule_next:51]Finished compaction for compact_views too soon. Next run will be in 30s[ns_server:error,2023-12-05T06:07:33.629-08:00,ns_1@s10501-ip6.qe.couchbase.com:<0.3685.0>:service_agent:wait_for_connection_loop:387]No connection with label "backup-service_api" after 60000ms. Exiting.[error_logger:error,2023-12-05T06:07:33.629-08:00,ns_1@s10501-ip6.qe.couchbase.com:<0.3685.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: service_agent:'-spawn_connection_waiter/2-fun-0-'/0    pid: <0.3685.0>    registered_name: []    exception exit: {no_connection,"backup-service_api"}      in function  service_agent:wait_for_connection_loop/3 (src/service_agent.erl, line 389)    ancestors: ['service_agent-backup',service_agent_children_sup,                  service_agent_sup,ns_server_sup,ns_server_nodes_sup,                  <0.289.0>,ns_server_cluster_sup,root_sup,<0.155.0>]    message_queue_len: 0    messages: []    links: [<0.3684.0>,<0.3687.0>]    dictionary: []    trap_exit: false    status: running    heap_size: 987    stack_size: 28    reductions: 2854  neighbours:
      [ns_server:error,2023-12-05T06:07:33.629-08:00,ns_1@s10501-ip6.qe.couchbase.com:service_agent-backup<0.3684.0>:service_agent:handle_info:320]Linked process <0.3685.0> died with reason {no_connection,                                            "backup-service_api"}. Terminating[ns_server:debug,2023-12-05T06:07:33.629-08:00,ns_1@s10501-ip6.qe.couchbase.com:<0.3687.0>:ns_pubsub:do_subscribe_link_continue:150]Parent process of subscription {json_rpc_events,<0.3685.0>} exited with reason {no_connection,                                                                                "backup-service_api"}[ns_server:error,2023-12-05T06:07:33.629-08:00,ns_1@s10501-ip6.qe.couchbase.com:service_agent-backup<0.3684.0>:service_agent:terminate:350]Terminating abnormally[error_logger:error,2023-12-05T06:07:33.630-08:00,ns_1@s10501-ip6.qe.couchbase.com:service_agent-backup<0.3684.0>:ale_error_logger_handler:do_log:101]=========================ERROR REPORT=========================** Generic server 'service_agent-backup' terminating ** Last message in was {'EXIT',<0.3685.0>,                               {no_connection,"backup-service_api"}}** When Server state == {state,backup,                         {dict,4,16,16,8,80,48,                          {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},                          {{[],[],                            [[{uuid,<<"b2920bb0d6ee8cbde3ea5d0bfde3b54f">>}|                              'ns_1@s10505-ip6.qe.couchbase.com']],                            [],[],[],                            [[{node,'ns_1@s10501-ip6.qe.couchbase.com'}|                              <<"37945d5df8d57072eff2147d32728ad3">>],                             [{node,'ns_1@s10505-ip6.qe.couchbase.com'}|                              <<"b2920bb0d6ee8cbde3ea5d0bfde3b54f">>]],                            [],[],[],[],[],[],[],[],                            [[{uuid,                               <<"37945d5df8d57072eff2147d32728ad3">>}|                              'ns_1@s10501-ip6.qe.couchbase.com']]}}},                         undefined,undefined,<35530.3996.0>,                         #Ref<0.641258218.2764832771.219692>,<0.3750.0>,                         {[{<35530.4004.0>,                            [alias|#Ref<35530.2355886961.2765422594.254650>]}],                          []},                         undefined,undefined,undefined,undefined,undefined,                         undefined}** Reason for termination ==** {linked_process_died,<0.3685.0>,       {'ns_1@s10501-ip6.qe.couchbase.com',           {no_connection,"backup-service_api"}}}
      [error_logger:error,2023-12-05T06:07:33.630-08:00,ns_1@s10501-ip6.qe.couchbase.com:service_agent-backup<0.3684.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: service_agent:init/1    pid: <0.3684.0>    registered_name: 'service_agent-backup'    exception exit: {linked_process_died,<0.3685.0>,                        {'ns_1@s10501-ip6.qe.couchbase.com',                            {no_connection,"backup-service_api"}}}      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1241)    ancestors: [service_agent_children_sup,service_agent_sup,ns_server_sup,                  ns_server_nodes_sup,<0.289.0>,ns_server_cluster_sup,                  root_sup,<0.155.0>]    message_queue_len: 1    messages: [{'EXIT',<0.3750.0>,                      {linked_process_died,<0.3685.0>,                          {'ns_1@s10501-ip6.qe.couchbase.com',                              {no_connection,"backup-service_api"}}}}]    links: [<0.3686.0>,<0.604.0>]    dictionary: []    trap_exit: true    status: running    heap_size: 10958    stack_size: 28    reductions: 30699  neighbours: 


       

      Testrunner script to reproduce

      ./testrunner -i /data/workspace/debian-p0-ipv6-vset00-00-sanity-mix/testexec.17578.ini -p get-cbcollect-info=False,enable_ipv6=True,get-cbcollect-info=True,get-cbcollect-info=True,sirius_url=http://172.23.120.103:4000 -t xdcr.uniXDCR.unidirectional.load_with_ops,items=5000,ctopology=chain,rdirection=unidirection,update=C1,delete=C1

       Job name : debian-ipv6_sanity-mix

      Job ref link : http://cb-logs-qe.s3-website-us-west-2.amazonaws.com/7.6.0-1878/jenkins_logs/test_suite_executor/649150/

      Attachments

        For Gerrit Dashboard: MB-60018
        # Subject Branch Project Status CR V

        Activity

          People

            raghav.sk Raghav S K
            raghav.sk Raghav S K
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty