Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45789

[Volume Test] [MOI 10K test] Rebalance to remove a failed over node from the cluster failed - linked_process_died, ServiceAPI.GetTaskList

    XMLWordPrintable

Details

    Description

      Build : 7.0.0-4960
      Test : -test tests/2i/cheshirecat/test_idx_cc_vol_10K_moi_tmp.yml -scope tests/2i/cheshirecat/scope_idx_cc_vol_10K_moi.yml (MOI 10K indexes test)
      Scale : 5

      In the volume test, just after the step that failed due to MB-45788, there is a step to failover another node and subsequently remove it from the cluster. This rebalance operation failed in under 1 min due to the following error :

      Rebalance exited with reason {service_rebalance_failed,index,
      {agent_died,<30220.30871.0>,
      {linked_process_died,<30220.27121.980>,
      {timeout,
      {gen_server,call,
      [<30220.31505.0>,
      {call,"ServiceAPI.GetTaskList",
      #Fun<json_rpc_connection.0.77329884>},
      60000]}}}}}.
      Rebalance Operation Id = 1756a35750f5f478727f39508eeebdf1
      

      On another indexer node 172.23.96.254, the following can be seen in the debug logs :

      [error_logger:error,2021-04-19T12:47:49.516-07:00,ns_1@172.23.96.254:<0.27121.980>:ale_error_logger_handler:do_log:101]
      =========================CRASH REPORT=========================
        crasher:
          initial call: service_agent:'-start_long_poll_worker/4-fun-0-'/0
          pid: <0.27121.980>
          registered_name: []
          exception exit: {timeout,
                              {gen_server,call,
                                  [<0.31505.0>,
                                   {call,"ServiceAPI.GetTaskList",
                                       #Fun<json_rpc_connection.0.77329884>},
                                   60000]}}
            in function  gen_server:call/3 (gen_server.erl, line 223)
            in call from service_api:perform_call/3 (src/service_api.erl, line 49)
            in call from service_agent:grab_tasks/2 (src/service_agent.erl, line 568)
            in call from service_agent:long_poll_worker_loop/5 (src/service_agent.erl, line 649)
          ancestors: ['service_agent-index',service_agent_children_sup,
                        service_agent_sup,ns_server_sup,ns_server_nodes_sup,
                        <0.25023.0>,ns_server_cluster_sup,root_sup,<0.140.0>]
          message_queue_len: 0
          messages: []
          links: [<0.30871.0>]
          dictionary: []
          trap_exit: false
          status: running
          heap_size: 610
          stack_size: 27
          reductions: 238
        neighbours:
       
      [ns_server:error,2021-04-19T12:47:49.517-07:00,ns_1@172.23.96.254:service_agent-index<0.30871.0>:service_agent:handle_info:277]Linked process <0.27121.980> died with reason {timeout,
                                                     {gen_server,call,
                                                      [<0.31505.0>,
                                                       {call,
                                                        "ServiceAPI.GetTaskList",
                                                        #Fun<json_rpc_connection.0.77329884>},
                                                       60000]}}. Terminating
      [ns_server:error,2021-04-19T12:47:49.518-07:00,ns_1@172.23.96.254:service_agent-index<0.30871.0>:service_agent:terminate:306]Terminating abnormally
      [ns_server:error,2021-04-19T12:47:49.518-07:00,ns_1@172.23.96.254:service_agent-index<0.30871.0>:service_agent:terminate:311]Terminating json rpc connection for index: <0.31505.0>
      ...
      ...
      [error_logger:error,2021-04-19T12:47:49.518-07:00,ns_1@172.23.96.254:service_agent-index<0.30871.0>:ale_error_logger_handler:do_log:101]
      =========================ERROR REPORT=========================
      ** Generic server 'service_agent-index' terminating
      ** Last message in was {'EXIT',<0.27121.980>,
                              {timeout,
                               {gen_server,call,
                                [<0.31505.0>,
                                 {call,"ServiceAPI.GetTaskList",
                                  #Fun<json_rpc_connection.0.77329884>},
                                 60000]}}}
      ** When Server state == {state,index,
                               {dict,58,16,16,8,80,48,
                                {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                                {{[[{uuid,<<"f024244e895bf88d497f9962abf63562">>}|
                                    'ns_1@172.23.106.136']],
                                  [[{uuid,<<"49fb081b12e2b0e57401b211cedc2d60">>}|
                                    'ns_1@172.23.120.75']],
                                  [[{node,'ns_1@172.23.120.81'}|
                                    <<"0ca6f6ade5b881dbcf323697e57bd50f">>],
      ...
      ...
      ** Reason for termination ==
      ** {linked_process_died,<0.27121.980>,
             {timeout,
                 {gen_server,call,
                     [<0.31505.0>,
                      {call,"ServiceAPI.GetTaskList",
                          #Fun<json_rpc_connection.0.77329884>},
                      60000]}}}
       
      [error_logger:error,2021-04-19T12:47:49.521-07:00,ns_1@172.23.96.254:service_agent-index<0.30871.0>:ale_error_logger_handler:do_log:101]
      =========================CRASH REPORT=========================
        crasher:
          initial call: service_agent:init/1
          pid: <0.30871.0>
          registered_name: 'service_agent-index'
          exception exit: {linked_process_died,<0.27121.980>,
                              {timeout,
                                  {gen_server,call,
                                      [<0.31505.0>,
                                       {call,"ServiceAPI.GetTaskList",
                                           #Fun<json_rpc_connection.0.77329884>},
                                       60000]}}}
            in function  gen_server:handle_common_reply/8 (gen_server.erl, line 751)
          ancestors: [service_agent_children_sup,service_agent_sup,ns_server_sup,
                        ns_server_nodes_sup,<0.25023.0>,ns_server_cluster_sup,
                        root_sup,<0.140.0>]
          message_queue_len: 3
          messages: [{'EXIT',<0.27969.980>,
                         {linked_process_died,<0.27121.980>,
                          {timeout,
                           {gen_server,call,
                            [<0.31505.0>,
                             {call,"ServiceAPI.GetTaskList",
                              #Fun<json_rpc_connection.0.77329884>},
                             60000]}}}},
                        {'EXIT',<0.26730.980>,
                         {linked_process_died,<0.27121.980>,
                          {timeout,
                           {gen_server,call,
                            [<0.31505.0>,
                             {call,"ServiceAPI.GetTaskList",
                              #Fun<json_rpc_connection.0.77329884>},
                             60000]}}}},
                        {'DOWN',#Ref<0.3749731112.838336520.226783>,process,
                         <0.31505.0>,
                         {service_agent_died,
                          {linked_process_died,<0.27121.980>,
                           {timeout,
                            {gen_server,call,
                             [<0.31505.0>,
                              {call,"ServiceAPI.GetTaskList",
                               #Fun<json_rpc_connection.0.77329884>},
                              60000]}}}}}]
          links: [<0.30873.0>,<0.25556.0>]
          dictionary: []
          trap_exit: true
          status: running
          heap_size: 46422
          stack_size: 27
          reductions: 3791543
        neighbours:
       
      [ns_server:debug,2021-04-19T12:47:49.522-07:00,ns_1@172.23.96.254:<0.30873.0>:ns_pubsub:do_subscribe_link_continue:152]Parent process of subscription {ns_config_events,<0.30871.0>} exited with reason {linked_process_died,
                                                                                        <0.27121.980>,
                                                                                        {timeout,
                                                                                         {gen_server,
                                                                                          call,
                                                                                          [<0.31505.0>,
                                                                                           {call,
                                                                                            "ServiceAPI.GetTaskList",
                                                                                            #Fun<json_rpc_connection.0.77329884>},
                                                                                           60000]}}}
      

      Indexer nodes : 172.23.106.136, 172.23.120.58, 172.23.120.74, 172.23.120.75, 172.23.120.77, 172.23.120.81, 172.23.120.86, 172.23.123.31, 172.23.123.32, 172.23.123.33, 172.23.96.243, 172.23.96.254, 172.23.97.105, 172.23.97.110, 172.23.97.112

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty