Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
Untriaged
-
-
1
-
Unknown
Description
Build : 7.0.0-4960
Test : -test tests/2i/cheshirecat/test_idx_cc_vol_10K_moi_tmp.yml -scope tests/2i/cheshirecat/scope_idx_cc_vol_10K_moi.yml (MOI 10K indexes test)
Scale : 5
In the volume test, just after the step that failed due to MB-45788, there is a step to failover another node and subsequently remove it from the cluster. This rebalance operation failed in under 1 min due to the following error :
Rebalance exited with reason {service_rebalance_failed,index,
|
{agent_died,<30220.30871.0>,
|
{linked_process_died,<30220.27121.980>,
|
{timeout,
|
{gen_server,call,
|
[<30220.31505.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}}}.
|
Rebalance Operation Id = 1756a35750f5f478727f39508eeebdf1
|
On another indexer node 172.23.96.254, the following can be seen in the debug logs :
[error_logger:error,2021-04-19T12:47:49.516-07:00,ns_1@172.23.96.254:<0.27121.980>:ale_error_logger_handler:do_log:101]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: service_agent:'-start_long_poll_worker/4-fun-0-'/0
|
pid: <0.27121.980>
|
registered_name: []
|
exception exit: {timeout,
|
{gen_server,call,
|
[<0.31505.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}
|
in function gen_server:call/3 (gen_server.erl, line 223)
|
in call from service_api:perform_call/3 (src/service_api.erl, line 49)
|
in call from service_agent:grab_tasks/2 (src/service_agent.erl, line 568)
|
in call from service_agent:long_poll_worker_loop/5 (src/service_agent.erl, line 649)
|
ancestors: ['service_agent-index',service_agent_children_sup,
|
service_agent_sup,ns_server_sup,ns_server_nodes_sup,
|
<0.25023.0>,ns_server_cluster_sup,root_sup,<0.140.0>]
|
message_queue_len: 0
|
messages: []
|
links: [<0.30871.0>]
|
dictionary: []
|
trap_exit: false
|
status: running
|
heap_size: 610
|
stack_size: 27
|
reductions: 238
|
neighbours:
|
|
[ns_server:error,2021-04-19T12:47:49.517-07:00,ns_1@172.23.96.254:service_agent-index<0.30871.0>:service_agent:handle_info:277]Linked process <0.27121.980> died with reason {timeout,
|
{gen_server,call,
|
[<0.31505.0>,
|
{call,
|
"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}. Terminating
|
[ns_server:error,2021-04-19T12:47:49.518-07:00,ns_1@172.23.96.254:service_agent-index<0.30871.0>:service_agent:terminate:306]Terminating abnormally
|
[ns_server:error,2021-04-19T12:47:49.518-07:00,ns_1@172.23.96.254:service_agent-index<0.30871.0>:service_agent:terminate:311]Terminating json rpc connection for index: <0.31505.0>
|
...
|
...
|
[error_logger:error,2021-04-19T12:47:49.518-07:00,ns_1@172.23.96.254:service_agent-index<0.30871.0>:ale_error_logger_handler:do_log:101]
|
=========================ERROR REPORT=========================
|
** Generic server 'service_agent-index' terminating
|
** Last message in was {'EXIT',<0.27121.980>,
|
{timeout,
|
{gen_server,call,
|
[<0.31505.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}
|
** When Server state == {state,index,
|
{dict,58,16,16,8,80,48,
|
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
|
{{[[{uuid,<<"f024244e895bf88d497f9962abf63562">>}|
|
'ns_1@172.23.106.136']],
|
[[{uuid,<<"49fb081b12e2b0e57401b211cedc2d60">>}|
|
'ns_1@172.23.120.75']],
|
[[{node,'ns_1@172.23.120.81'}|
|
<<"0ca6f6ade5b881dbcf323697e57bd50f">>],
|
...
|
...
|
** Reason for termination ==
|
** {linked_process_died,<0.27121.980>,
|
{timeout,
|
{gen_server,call,
|
[<0.31505.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}
|
|
[error_logger:error,2021-04-19T12:47:49.521-07:00,ns_1@172.23.96.254:service_agent-index<0.30871.0>:ale_error_logger_handler:do_log:101]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: service_agent:init/1
|
pid: <0.30871.0>
|
registered_name: 'service_agent-index'
|
exception exit: {linked_process_died,<0.27121.980>,
|
{timeout,
|
{gen_server,call,
|
[<0.31505.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}
|
in function gen_server:handle_common_reply/8 (gen_server.erl, line 751)
|
ancestors: [service_agent_children_sup,service_agent_sup,ns_server_sup,
|
ns_server_nodes_sup,<0.25023.0>,ns_server_cluster_sup,
|
root_sup,<0.140.0>]
|
message_queue_len: 3
|
messages: [{'EXIT',<0.27969.980>,
|
{linked_process_died,<0.27121.980>,
|
{timeout,
|
{gen_server,call,
|
[<0.31505.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}},
|
{'EXIT',<0.26730.980>,
|
{linked_process_died,<0.27121.980>,
|
{timeout,
|
{gen_server,call,
|
[<0.31505.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}},
|
{'DOWN',#Ref<0.3749731112.838336520.226783>,process,
|
<0.31505.0>,
|
{service_agent_died,
|
{linked_process_died,<0.27121.980>,
|
{timeout,
|
{gen_server,call,
|
[<0.31505.0>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}}}]
|
links: [<0.30873.0>,<0.25556.0>]
|
dictionary: []
|
trap_exit: true
|
status: running
|
heap_size: 46422
|
stack_size: 27
|
reductions: 3791543
|
neighbours:
|
|
[ns_server:debug,2021-04-19T12:47:49.522-07:00,ns_1@172.23.96.254:<0.30873.0>:ns_pubsub:do_subscribe_link_continue:152]Parent process of subscription {ns_config_events,<0.30871.0>} exited with reason {linked_process_died,
|
<0.27121.980>,
|
{timeout,
|
{gen_server,
|
call,
|
[<0.31505.0>,
|
{call,
|
"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}
|
Indexer nodes : 172.23.106.136, 172.23.120.58, 172.23.120.74, 172.23.120.75, 172.23.120.77, 172.23.120.81, 172.23.120.86, 172.23.123.31, 172.23.123.32, 172.23.123.33, 172.23.96.243, 172.23.96.254, 172.23.97.105, 172.23.97.110, 172.23.97.112