Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: 2.0
Affects Version/s: 2.0-beta-2
Component/s: ns_server
Security Level: Public
Labels:
- pblock
Environment:
VMs, Windows 64-bit, 4 nodes, HDD, 4 cores, 24GB
Build 1940

Description

Restarts happen on 1 or 2 nodes every time I run tests, usually with the same error.
No problems with loading data and initial indexing. Why does it happen?

[ns_server:info,2012-11-06T12:43:31.635,ns_1@10.2.3.31:mb_master<0.18558.13>:mb_master:terminate:288]Synchronously shutting down child mb_master_sup
[error_logger:error,2012-11-06T12:43:32.745,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor:

{local,mb_master_sup}

Context: shutdown_error
Reason: killed
Offender: [

{pid,<0.19815.13>}

{name,ns_orchestrator}

,
{mfargs,{ns_orchestrator,start_link,[]}},

{restart_type,permanent},
{shutdown,20},
{child_type,worker}]

[stats:warn,2012-11-06T12:43:32.651,ns_1@10.2.3.31:system_stats_collector<0.478.0>:system_stats_collector:handle_info:133]lost 7 ticks
[ns_server:debug,2012-11-06T12:43:33.495,ns_1@10.2.3.31:<0.18559.13>:ns_pubsub:do_subscribe_link:132]Parent process of subscription {ns_config_events,<0.18558.13>} exited with reason {timeout,
{gen_server,
call,
[ns_node_disco,
nodes_wanted]}}
[error_logger:error,2012-11-06T12:43:34.073,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** State machine mb_master terminating
** Last message in was send_heartbeat
** When State == master
** Data == {state,<0.19814.13>,'ns_1@10.2.3.31',
['ns_1@10.2.3.31','ns_1@10.2.3.33','ns_1@10.2.3.34',
'ns_1@10.2.3.35'],
{1352,234605,120106}}
** Reason for termination =
** {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}

[ns_server:debug,2012-11-06T12:43:35.166,ns_1@10.2.3.31:ns_server_sup<0.385.0>:mb_master:check_master_takeover_needed:144]Sending master node question to the following nodes: ['ns_1@10.2.3.35',
'ns_1@10.2.3.34',
'ns_1@10.2.3.33']
[error_logger:error,2012-11-06T12:43:35.276,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: mb_master:init/1
pid: <0.18558.13>
registered_name: mb_master
exception exit: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
in function gen_fsm:terminate/7
ancestors: [ns_server_sup,ns_server_cluster_sup,<0.66.0>]
messages: [send_heartbeat,send_heartbeat,send_heartbeat,send_heartbeat,
{#Ref<0.0.372.79904>, ['ns_1@10.2.3.31','ns_1@10.2.3.33','ns_1@10.2.3.34', 'ns_1@10.2.3.35']}]
links: [<0.385.0>,<0.18559.13>,<0.63.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 377
stack_size: 24
reductions: 147300
neighbours:

[error_logger:error,2012-11-06T12:43:35.307,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,ns_server_sup}
Context: child_terminated
Reason: {timeout,{gen_server,call,[ns_node_disco,nodes_wanted]}}
Offender: [{pid,<0.18558.13>},
{name,mb_master},
{mfargs,{mb_master,start_link,[]}},
{restart_type,permanent}

{shutdown,infinity}

{child_type,supervisor}

]

[ns_server:error,2012-11-06T12:43:35.323,ns_1@10.2.3.31:<0.788.0>:ns_memcached:verify_report_long_call:297]call

{stats,<<>>}

took too long: 10203000 us
[couchdb:error,2012-11-06T12:43:41.588,ns_1@10.2.3.31:<0.24345.2>:couch_log:error:42]Uncaught error in HTTP request: {exit,
{timeout,

{gen_server,call,[ns_config,get]}

}}

Stacktrace: [

{diag_handler,diagnosing_timeouts,1}

{menelaus_auth,check_auth,1}

{menelaus_auth,bucket_auth_fun,1}

{menelaus_auth,is_bucket_accessible,2}

{capi_frontend,do_db_req,2}

{couch_httpd,handle_request,6}

{mochiweb_http,headers,5}

{proc_lib,init_p_do_apply,3}

]
[error_logger:error,2012-11-06T12:43:41.604,ns_1@10.2.3.31:error_logger<0.5.0>:ale_error_logger_handler:log_msg:76]** Generic server disksup terminating

- Last message in was timeout
- When Server state == [{data,[{"OS",{win32,nt}},
  {"Timeout",60000}
  ,
  
  {"Threshold",80}
  ,
  
  Unknown macro: {"DiskData", [{"C:\\",52324348,51}, {"E:\\",268432380,14}]}
  
  ]}]
- Reason for termination ==
- Unknown macro: {timeout,{gen_server,call,[os_mon_sysinfo,get_disk_info]}}

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

192.168.162.30-1162012-diag.zip
8.72 MB
06/Nov/12 5:41 PM
192.168.162.31-1162012-diag.zip
2.14 MB
06/Nov/12 5:41 PM
192.168.162.32-1162012-diag.zip
2.14 MB
06/Nov/12 5:41 PM
192.168.162.33-1162012-diag.zip
2.21 MB
06/Nov/12 5:41 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Bin Cui (Inactive)

Reporter:: Pavel Paulau (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Nov/12 5:41 PM

Updated:: 18/Feb/14 10:25 PM

Resolved:: 28/Nov/12 11:10 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-7113: Set Async thread number to 16 on windows: Gerrit Review:

MB-7113: Set Async thread number to 16 on windows: Gerrit Review:

windows - constant restarts of mb_master during small scale performance tests

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty