Details
-
Bug
-
Resolution: Incomplete
-
Critical
-
Cheshire-Cat
-
Untriaged
-
-
1
-
Unknown
Description
Build : 7.0.0-5161
Test : -test tests/2i/cheshirecat/test_idx_cc_vol_10K.yml -scope tests/2i/cheshirecat/scope_idx_cc_vol_10K.yml
Scale : 5
Iteration : 1st
Rebalance out of KV node 172.23.96.14 failed due to the following error :
[ns_server:error,2021-05-14T21:28:08.652-07:00,ns_1@172.23.120.58:service_rebalancer-index<0.23744.459>:service_rebalancer:run_rebalance_worker:130]Agent terminated during the rebalance: {'DOWN',
|
#Ref<0.772317819.1189609476.126010>,
|
process,<29859.890.6>,
|
{linked_process_died,
|
<29859.14034.446>,
|
{'ns_1@172.23.123.31',
|
{timeout,
|
{gen_server,call,
|
[<29859.1100.6>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}}}
|
[ns_server:error,2021-05-14T21:28:08.654-07:00,ns_1@172.23.120.58:service_agent-index<0.30446.446>:service_agent:handle_info:281]Rebalancer <0.23744.459> died unexpectedly: {agent_died,<29859.890.6>,
|
{linked_process_died,
|
<29859.14034.446>,
|
{'ns_1@172.23.123.31',
|
{timeout,
|
{gen_server,call,
|
[<29859.1100.6>,
|
{call,
|
"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}}}
|
[user:error,2021-05-14T21:28:08.834-07:00,ns_1@172.23.120.58:<0.29127.446>:ns_orchestrator:log_rebalance_completion:1416]Rebalance exited with reason {service_rebalance_failed,index,
|
{agent_died,<29859.890.6>,
|
{linked_process_died,<29859.14034.446>,
|
{'ns_1@172.23.123.31',
|
{timeout,
|
{gen_server,call,
|
[<29859.1100.6>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}}}}.
|
Rebalance Operation Id = a21ff0e40e2a32d9787d17966efd45ca
|
On 172.23.123.31, following is seen in the debug logs :
[ns_server:warn,2021-05-14T21:27:55.955-07:00,ns_1@172.23.123.31:ns_server_monitor<0.25192.462>:health_monitor:send_heartbeat_inner:157]ns_server_monitor send heartbeat timed out
|
|
[ns_server:error,2021-05-14T21:28:00.212-07:00,ns_1@172.23.123.31:service_status_keeper_worker<0.27076.5>:rest_utils:get_json:57]Request to (indexer) getIndexStatus with headers [{"If-None-Match",
|
"4cec15fd7844af82"}] failed: {error,
|
timeout}
|
[ns_server:error,2021-05-14T21:28:00.212-07:00,ns_1@172.23.123.31:service_status_keeper-index<0.27077.5>:service_status_keeper:handle_cast:103]Service service_index returned incorrect status
|
[ns_server:debug,2021-05-14T21:28:00.213-07:00,ns_1@172.23.123.31:compaction_daemon<0.27091.5>:compaction_daemon:process_scheduler_message:1302]No buckets to compact for compact_kv. Rescheduling compaction.
|
[ns_server:debug,2021-05-14T21:28:00.213-07:00,ns_1@172.23.123.31:compaction_daemon<0.27091.5>:compaction_scheduler:schedule_next:51]Finished compaction for compact_kv too soon. Next run will be in 30s
|
[ns_server:debug,2021-05-14T21:28:02.284-07:00,ns_1@172.23.123.31:compaction_daemon<0.27091.5>:compaction_daemon:process_scheduler_message:1302]No buckets to compact for compact_views. Rescheduling compaction.
|
[ns_server:debug,2021-05-14T21:28:02.284-07:00,ns_1@172.23.123.31:compaction_daemon<0.27091.5>:compaction_scheduler:schedule_next:51]Finished compaction for compact_views too soon. Next run will be in 30s
|
[ns_server:debug,2021-05-14T21:28:02.896-07:00,ns_1@172.23.123.31:ldap_auth_cache<0.26693.5>:active_cache:cleanup:225]Cache ldap_auth_cache cleanup: 0/0 records deleted
|
[ns_server:info,2021-05-14T21:28:05.013-07:00,ns_1@172.23.123.31:ns_config_rep<0.26830.5>:ns_config_rep:pull_one_node:377]Pulling config from: 'ns_1@172.23.97.241'
|
[error_logger:error,2021-05-14T21:28:07.440-07:00,ns_1@172.23.123.31:<0.14034.446>:ale_error_logger_handler:do_log:101]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: service_agent:'-start_long_poll_worker/4-fun-0-'/0
|
pid: <0.14034.446>
|
registered_name: []
|
exception exit: {timeout,
|
{gen_server,call,
|
[<0.1100.6>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}
|
in function gen_server:call/3 (gen_server.erl, line 223)
|
in call from service_api:perform_call/3 (src/service_api.erl, line 49)
|
in call from service_agent:grab_tasks/2 (src/service_agent.erl, line 568)
|
in call from service_agent:long_poll_worker_loop/5 (src/service_agent.erl, line 649)
|
ancestors: ['service_agent-index',service_agent_children_sup,
|
service_agent_sup,ns_server_sup,ns_server_nodes_sup,
|
<0.26637.5>,ns_server_cluster_sup,root_sup,<0.139.0>]
|
message_queue_len: 0
|
messages: []
|
links: [<0.890.6>]
|
dictionary: []
|
trap_exit: false
|
status: running
|
heap_size: 1598
|
stack_size: 27
|
reductions: 10982
|
neighbours:
|
|
[ns_server:error,2021-05-14T21:28:07.440-07:00,ns_1@172.23.123.31:service_agent-index<0.890.6>:service_agent:handle_info:277]Linked process <0.14034.446> died with reason {timeout,
|
{gen_server,call,
|
[<0.1100.6>,
|
{call,
|
"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}. Terminating
|
[chronicle:debug,2021-05-14T21:28:08.615-07:00,ns_1@172.23.123.31:chronicle_leader<0.26593.5>:chronicle_leader:handle_state_timeout:562]State timeout when state is: {follower,'ns_1@172.23.97.74',
|
<<"9fdde511ed879690b2e04aba53cba39c">>,
|
{4,'ns_1@172.23.97.74'},
|
established}
|
[ns_server:error,2021-05-14T21:28:08.647-07:00,ns_1@172.23.123.31:service_agent-index<0.890.6>:service_agent:terminate:306]Terminating abnormally
|
[ns_server:error,2021-05-14T21:28:08.648-07:00,ns_1@172.23.123.31:service_agent-index<0.890.6>:service_agent:terminate:311]Terminating json rpc connection for index: <0.1100.6>
|
[error_logger:error,2021-05-14T21:28:08.649-07:00,ns_1@172.23.123.31:service_agent-index<0.890.6>:ale_error_logger_handler:do_log:101]
|
=========================ERROR REPORT=========================
|
** Generic server 'service_agent-index' terminating
|
** Last message in was {'EXIT',<0.14034.446>,
|
{timeout,
|
{gen_server,call,
|
[<0.1100.6>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}
|
** When Server state == {state,index,
|
{dict,60,16,16,8,80,48,
|
...
|
...
|
[error_logger:error,2021-05-14T21:28:08.652-07:00,ns_1@172.23.123.31:service_agent-index<0.890.6>:ale_error_logger_handler:do_log:101]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: service_agent:init/1
|
pid: <0.890.6>
|
registered_name: 'service_agent-index'
|
exception exit: {linked_process_died,<0.14034.446>,
|
{'ns_1@172.23.123.31',
|
{timeout,
|
{gen_server,call,
|
[<0.1100.6>,
|
{call,"ServiceAPI.GetTaskList",
|
#Fun<json_rpc_connection.0.77329884>},
|
60000]}}}}
|
in function gen_server:handle_common_reply/8 (gen_server.erl, line 751)
|
ancestors: [service_agent_children_sup,service_agent_sup,ns_server_sup,
|
ns_server_nodes_sup,<0.26637.5>,ns_server_cluster_sup,
|
root_sup,<0.139.0>]
|
message_queue_len: 3
|
Nodes with index service : 172.23.106.136, 172.23.120.58, 172.23.120.74, 172.23.120.75, 172.23.120.77, 172.23.120.81, 172.23.120.86, 172.23.123.26, 172.23.123.31, 172.23.123.32, 172.23.123.33, 172.23.96.243, 172.23.96.254, 172.23.97.105, 172.23.97.110, 172.23.97.112
Nodes with kv service : 172.23.120.73, 172.23.121.77, 172.23.123.24, 172.23.96.122, 172.23.96.14, 172.23.96.48, 172.23.97.241, 172.23.97.74
[Kevin Cherkauer added: Earlier logs for node 123.31 from same test linked in MB-46318:
Collected at around 2021-05-14 9 PM : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621051816/collectinfo-2021-05-15T041026-ns_1%40172.23.123.31.zip
Collected at around 2021-05-14 7 PM : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1621044363/collectinfo-2021-05-15T020611-ns_1%40172.23.123.31.zip
]