Details
-
Bug
-
Resolution: Fixed
-
Critical
-
5.5.0
-
Untriaged
-
-
Unknown
Description
Build : 5.5.0-2671
Test : 2i Component System Test
Rebalance in operation for 2 indexer nodes fails with the following error :
Rebalance exited with reason {service_rebalance_failed,index,
{linked_process_died,<21305.4870.13>,
{timeout,
{gen_server,call,
[<21305.26062.5>,
,
60000]}}}}
The following error is seen on the faulting node - 172.23.104.23. Also noticed that the CPU utilization on this particular index node was always near 100%, but not so much on the other indexer nodes. This is after we have reduced the query load and also optimized the aggregate queries to operate on a smaller dataset rather than querying the entire dataset.
[error_logger:error,2018-05-08T01:38:05.240-07:00,ns_1@172.23.104.23:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: service_agent:-start_long_poll_worker/4-fun-0-/0
|
pid: <0.4870.13>
|
registered_name: []
|
exception exit: {timeout,
|
{gen_server,call,
|
[<0.26062.5>,
|
{call,"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.125340786>},
|
60000]}}
|
in function gen_server:call/3 (gen_server.erl, line 188)
|
in call from service_api:perform_call/3 (src/service_api.erl, line 55)
|
in call from service_agent:grab_topology/2 (src/service_agent.erl, line 540)
|
in call from service_agent:long_poll_worker_loop/5 (src/service_agent.erl, line 605)
|
ancestors: ['service_agent-index',service_agent_children_sup,
|
service_agent_sup,ns_server_sup,ns_server_nodes_sup,
|
<0.23266.5>,ns_server_cluster_sup,<0.89.0>]
|
messages: []
|
links: [<0.25945.5>]
|
dictionary: []
|
trap_exit: false
|
status: running
|
heap_size: 1598
|
stack_size: 27
|
reductions: 952
|
neighbours:
|
|
[error_logger:error,2018-05-08T01:38:05.240-07:00,ns_1@172.23.104.23:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]** Generic server 'service_agent-index' terminating
|
** Last message in was {'EXIT',<0.4870.13>,
|
{timeout,
|
{gen_server,call,
|
[<0.26062.5>,
|
{call,"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.125340786>},
|
60000]}}}
|
** When Server state == {state,index,
|
{dict,16,16,16,8,80,48,
|
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
|
{{[[{uuid,<<"c565729afd3adb2b04cda9b4405cc37b">>}|
|
'ns_1@172.23.104.25']],
|
[],
|
[[{uuid,<<"20b1aa2e171f9669692e5b664929fb24">>}|
|
'ns_1@172.23.104.16'],
|
[{node,'ns_1@172.23.104.21'}|
|
<<"5d39a577abd0b19c55099900169e71ad">>]],
|
[],
|
[[{uuid,<<"822eb22d0f7782be67291c0a174d7c61">>}|
|
'ns_1@172.23.104.19'],
|
[{node,'ns_1@172.23.104.17'}|
|
<<"e8bbe21bba2b8c8e326ba5f3db7c9340">>]],
|
[],[],[],
|
[[{uuid,<<"5d39a577abd0b19c55099900169e71ad">>}|
|
'ns_1@172.23.104.21'],
|
[{node,'ns_1@172.23.104.23'}|
|
<<"fd83264aff691821f94290908495e4df">>],
|
[{node,'ns_1@172.23.104.93'}|
|
<<"fb1b7960c046c12c9d331bfe85a48784">>]],
|
[[{node,'ns_1@172.23.104.16'}|
|
<<"20b1aa2e171f9669692e5b664929fb24">>],
|
[{uuid,<<"fb1b7960c046c12c9d331bfe85a48784">>}|
|
'ns_1@172.23.104.93']],
|
[[{node,'ns_1@172.23.104.19'}|
|
<<"822eb22d0f7782be67291c0a174d7c61">>]],
|
[],
|
[[{uuid,<<"06b812efcf760a83a49633a4cd8779d5">>}|
|
'ns_1@172.23.104.18']],
|
[],
|
[[{uuid,<<"e8bbe21bba2b8c8e326ba5f3db7c9340">>}|
|
'ns_1@172.23.104.17'],
|
[{node,'ns_1@172.23.104.25'}|
|
<<"c565729afd3adb2b04cda9b4405cc37b">>]],
|
[[{node,'ns_1@172.23.104.18'}|
|
<<"06b812efcf760a83a49633a4cd8779d5">>],
|
[{uuid,
|
<<"fd83264aff691821f94290908495e4df">>}|
|
'ns_1@172.23.104.23']]}}},
|
<0.26062.5>,#Ref<0.0.3.204287>,<20830.10136.45>,
|
#Ref<0.0.9.116687>,<0.23593.13>,
|
{[{<20830.10177.45>,#Ref<20830.0.29.240169>}],[]},
|
undefined,
|
{<<"AAAAAAAAAkA=">>,
|
[[{<<"rev">>,<<"AAAAAAAAAAA=">>},
|
{<<"id">>,
|
<<"prepare/afe5ddbbb7d3906f9e5089d2e1c96aa9">>},
|
{<<"type">>,<<"task-prepared">>},
|
{<<"status">>,<<"task-running">>},
|
{<<"isCancelable">>,true},
|
{<<"progress">>,0},
|
{<<"extra">>,
|
{[{<<"rebalanceId">>,
|
<<"afe5ddbbb7d3906f9e5089d2e1c96aa9">>}]}}]]},
|
{<<"AAAAAAAAAkA=">>,
|
{topology,
|
['ns_1@172.23.104.23','ns_1@172.23.104.25',
|
'ns_1@172.23.104.93'],
|
[<<"fd83264aff691821f94290908495e4df">>,
|
<<"c565729afd3adb2b04cda9b4405cc37b">>,
|
<<"fb1b7960c046c12c9d331bfe85a48784">>],
|
true,[]}},
|
<0.4869.13>,<0.4870.13>}
|
** Reason for termination ==
|
** {linked_process_died,<0.4870.13>,
|
{timeout,
|
{gen_server,call,
|
[<0.26062.5>,
|
{call,"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.125340786>},
|
60000]}}}
|
|
[error_logger:error,2018-05-08T01:38:05.242-07:00,ns_1@172.23.104.23:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: service_agent:init/1
|
pid: <0.25945.5>
|
registered_name: 'service_agent-index'
|
exception exit: {linked_process_died,<0.4870.13>,
|
{timeout,
|
{gen_server,call,
|
[<0.26062.5>,
|
{call,"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.125340786>},
|
60000]}}}
|
in function gen_server:terminate/6 (gen_server.erl, line 744)
|
ancestors: [service_agent_children_sup,service_agent_sup,ns_server_sup,
|
ns_server_nodes_sup,<0.23266.5>,ns_server_cluster_sup,
|
<0.89.0>]
|
messages: [{'EXIT',<0.4869.13>,
|
{linked_process_died,<0.4870.13>,
|
{timeout,
|
{gen_server,call,
|
[<0.26062.5>,
|
{call,"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.125340786>},
|
60000]}}}},
|
{'EXIT',<0.23593.13>,
|
{linked_process_died,<0.4870.13>,
|
{timeout,
|
{gen_server,call,
|
[<0.26062.5>,
|
{call,"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.125340786>},
|
60000]}}}}]
|
links: [<0.25947.5>,<0.23490.5>]
|
dictionary: []
|
trap_exit: true
|
status: running
|
heap_size: 28690
|
stack_size: 27
|
reductions: 85173
|
neighbours:
|
|
Attachments
Issue Links
- depends on
-
MB-29636 System test : Indexer node rebalance hung
- Closed