Details
Description
Restarts happen on 1 or 2 nodes every time I run tests, usually with the same error.
No problems with loading data and initial indexing. Why does it happen?
[ns_server:info,2012-11-06T12:43:31.635,ns_1@10.2.3.31:mb_master<0.18558.13>:mb_master:terminate:288]Synchronously shutting down child mb_master_sup
[error_logger:error,2012-11-06T12:43:32.745,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor:
Context: shutdown_error
Reason: killed
Offender: [
,
,
{mfargs,{ns_orchestrator,start_link,[]}},
{shutdown,20},
{child_type,worker}]
[stats:warn,2012-11-06T12:43:32.651,ns_1@10.2.3.31:system_stats_collector<0.478.0>:system_stats_collector:handle_info:133]lost 7 ticks
[ns_server:debug,2012-11-06T12:43:33.495,ns_1@10.2.3.31:<0.18559.13>:ns_pubsub:do_subscribe_link:132]Parent process of subscription {ns_config_events,<0.18558.13>} exited with reason {timeout,
{gen_server,
call,
[ns_node_disco,
nodes_wanted]}}
[error_logger:error,2012-11-06T12:43:34.073,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** State machine mb_master terminating
** Last message in was send_heartbeat
** When State == master
** Data == {state,<0.19814.13>,'ns_1@10.2.3.31',
['ns_1@10.2.3.31','ns_1@10.2.3.33','ns_1@10.2.3.34',
'ns_1@10.2.3.35'],
{1352,234605,120106}}
** Reason for termination =
** {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
[ns_server:debug,2012-11-06T12:43:35.166,ns_1@10.2.3.31:ns_server_sup<0.385.0>:mb_master:check_master_takeover_needed:144]Sending master node question to the following nodes: ['ns_1@10.2.3.35',
'ns_1@10.2.3.34',
'ns_1@10.2.3.33']
[error_logger:error,2012-11-06T12:43:35.276,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: mb_master:init/1
pid: <0.18558.13>
registered_name: mb_master
exception exit: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
in function gen_fsm:terminate/7
ancestors: [ns_server_sup,ns_server_cluster_sup,<0.66.0>]
messages: [send_heartbeat,send_heartbeat,send_heartbeat,send_heartbeat,
{#Ref<0.0.372.79904>, ['ns_1@10.2.3.31','ns_1@10.2.3.33','ns_1@10.2.3.34', 'ns_1@10.2.3.35']}]
links: [<0.385.0>,<0.18559.13>,<0.63.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 377
stack_size: 24
reductions: 147300
neighbours:
[error_logger:error,2012-11-06T12:43:35.307,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,ns_server_sup}
Context: child_terminated
Reason: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
Offender: [{pid,<0.18558.13>},
{name,mb_master},
{mfargs,{mb_master,start_link,[]}},
{restart_type,permanent}
,
,
]
[ns_server:error,2012-11-06T12:43:35.323,ns_1@10.2.3.31:<0.788.0>:ns_memcached:verify_report_long_call:297]call
{stats,<<>>} took too long: 10203000 us
[couchdb:error,2012-11-06T12:43:41.588,ns_1@10.2.3.31:<0.24345.2>:couch_log:error:42]Uncaught error in HTTP request: {exit,
{timeout,
}}
Stacktrace: [
{diag_handler,diagnosing_timeouts,1},
,
,
,
,
,
,
]
[error_logger:error,2012-11-06T12:43:41.604,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server disksup terminating
-
- Last message in was timeout
- When Server state == [{data,[{"OS",{win32,nt}},
{"Timeout",60000},
{"Threshold",80}
,
Unknown macro: {"DiskData", [{"C:\\",52324348,51}, {"E:\\",268432380,14}]}]}]
- Reason for termination ==
- Unknown macro: {timeout,{gen_server,call,[os_mon_sysinfo,get_disk_info]}}