Details
-
Bug
-
Resolution: Fixed
-
Test Blocker
-
Cheshire-Cat
-
Untriaged
-
-
1
-
No
Description
Build : 7.0.0-4554
Test : -test tests/2i/cheshirecat/test_idx_clusterops_cheshire_cat.yml -scope tests/2i/cheshirecat/scope_idx_cheshire_cat.yml
Scale : 2
Iteration : 4th
This could be similar to MB-44627, but I dont see the same symptoms as mentioned in the ticket. Also this issue is seen only once so far in the test that has run for 40 hrs.
Rebalance operation to remove indexer node 172.23.105.186 failed at 2021-03-03T04:55:06.
[2021-03-03T04:34:26-08:00, sequoiatools/couchbase-cli:7.0:e78c8a] rebalance -c 172.23.106.253:8091 --server-remove 172.23.105.186 -u Administrator -p password
|
→
|
|
Error occurred on container - sequoiatools/couchbase-cli:7.0:[rebalance -c 172.23.106.253:8091 --server-remove 172.23.105.186 -u Administrator -p password]
|
|
docker logs e78c8a
|
docker start e78c8a
|
|
*Unable to display progress bar on this os
|
JERROR: Rebalance failed. See logs for detailed reason. You can try again.
|
The error on the master node is -
[ns_server:error,2021-03-03T04:55:06.461-08:00,ns_1@172.23.106.253:service_rebalancer-index<0.6629.933>:service_rebalancer:run_rebalance_worker:136]Agent terminated during the rebalance: {'DOWN',
|
#Ref<0.3071195499.1647837191.242955>,
|
process,<29971.3312.233>,
|
{linked_process_died,
|
<29971.4242.282>,
|
{timeout,
|
{gen_server,call,
|
[<29971.4413.233>,
|
{call,
|
"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.44122352>},
|
60000]}}}}
|
[user:error,2021-03-03T04:55:06.464-08:00,ns_1@172.23.106.253:<0.9291.0>:ns_orchestrator:log_rebalance_completion:1407]Rebalance exited with reason {service_rebalance_failed,index,
|
{agent_died,<29971.3312.233>,
|
{linked_process_died,<29971.4242.282>,
|
{timeout,
|
{gen_server,call,
|
[<29971.4413.233>,
|
{call,"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.44122352>},
|
60000]}}}}}.
|
Rebalance Operation Id = 1530e97a9762d95dd9e3aa32d540c851
|
Upon checking all the indexer nodes, it seems like the node 172.23.106.255 had an issue. This is from the debug.log of 172.23.106.255 -
[error_logger:error,2021-03-03T04:55:06.457-08:00,ns_1@172.23.106.255:<0.4242.282>:ale_error_logger_handler:do_log:107]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: service_agent:'-start_long_poll_worker/4-fun-0-'/0
|
pid: <0.4242.282>
|
registered_name: []
|
exception exit: {timeout,
|
{gen_server,call,
|
[<0.4413.233>,
|
{call,"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.44122352>},
|
60000]}}
|
in function gen_server:call/3 (gen_server.erl, line 223)
|
in call from service_api:perform_call/3 (src/service_api.erl, line 55)
|
in call from service_agent:grab_topology/2 (src/service_agent.erl, line 590)
|
in call from service_agent:long_poll_worker_loop/5 (src/service_agent.erl, line 655)
|
ancestors: ['service_agent-index',service_agent_children_sup,
|
service_agent_sup,ns_server_sup,ns_server_nodes_sup,
|
<0.7883.0>,ns_server_cluster_sup,root_sup,<0.138.0>]
|
message_queue_len: 0
|
messages: []
|
links: [<0.3312.233>]
|
dictionary: []
|
trap_exit: false
|
status: running
|
heap_size: 1598
|
stack_size: 27
|
reductions: 20262
|
neighbours:
|
|
[ns_server:error,2021-03-03T04:55:06.458-08:00,ns_1@172.23.106.255:service_agent-index<0.3312.233>:service_agent:handle_info:283]Linked process <0.4242.282> died with reason {timeout,
|
{gen_server,call,
|
[<0.4413.233>,
|
{call,
|
"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.44122352>},
|
60000]}}. Terminating
|
[ns_server:error,2021-03-03T04:55:06.458-08:00,ns_1@172.23.106.255:service_agent-index<0.3312.233>:service_agent:terminate:312]Terminating abnormally
|
[ns_server:error,2021-03-03T04:55:06.458-08:00,ns_1@172.23.106.255:service_agent-index<0.3312.233>:service_agent:terminate:317]Terminating json rpc connection for index: <0.4413.233>
|
[error_logger:error,2021-03-03T04:55:06.458-08:00,ns_1@172.23.106.255:service_agent-index<0.3312.233>:ale_error_logger_handler:do_log:107]
|
=========================ERROR REPORT=========================
|
** Generic server 'service_agent-index' terminating
|
** Last message in was {'EXIT',<0.4242.282>,
|
{timeout,
|
{gen_server,call,
|
[<0.4413.233>,
|
{call,"ServiceAPI.GetCurrentTopology", #Fun<json_rpc_connection.0.44122352>},
|
60000]}}}
|
** When Server state == {state,index,
|
{dict,24,16,16,8,80,48,
|
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
|
{{[[{uuid,<<"e2c2bfa8d14931a5560f871b2a042546">>}|
|
'ns_1@172.23.107.89'],
|
[{uuid,<<"ad14f6d539c028084e7b002f6d3deacf">>}|
|
'ns_1@172.23.97.213']],
|
[],[],
|
[[{node,'ns_1@172.23.97.214'}|
|
<<"8c6944b13aca3fd1c0cbf52fd269eeef">>],
|
[{node,'ns_1@172.23.106.154'}|
|
<<"f803a0f259225e0f1ead3adc0f7b9e49">>]],
|
[],
|
[[{uuid,<<"b9e888e6147fea51b3e5f081cbb1a64e">>}|
|
'ns_1@172.23.105.185']],
|
[[{uuid,<<"a3641e2d6b8632676af5030ad2000433">>}|
|
'ns_1@172.23.106.242']],
|
[[{uuid,<<"3f7f791a1133d2219c15b421fd081794">>}|
|
'ns_1@172.23.106.243'],
|
[{uuid,<<"5797bb88a535aea59fa01ffdba22a0b6">>}|
|
'ns_1@172.23.106.255'],
|
...
|
...
|
...
|
** Reason for termination ==
|
** {linked_process_died,<0.4242.282>,
|
{timeout,
|
{gen_server,call,
|
[<0.4413.233>,
|
{call,"ServiceAPI.GetCurrentTopology",
|
#Fun<json_rpc_connection.0.44122352>},
|
60000]}}}
|