Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 2.0
Affects Version/s: 2.0-beta
Component/s: ns_server
Security Level: Public
Labels:
None

Description

build-705
steps:
1. 3 nodes in cluster with 1 sasl bucket and 10M items(10.3.121.112, 10.3.121.113, 10.3.121.114)
2. reboot all nodes at the same time

result:
10.3.121.112, 10.3.121.113 are in pending state, 10.3.121.114 is down with the error in the logs:

[error_logger:error,2012-08-19T20:31:39.066,ns_1@10.3.121.114:error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor:

{local,menelaus_sup}

Context: child_terminated
Reason: {noproc,
{gen_server,call,
[

{'stats_reader-sasl','ns_1@10.3.121.114'}

{latest,minute,1}

]}}
Offender: [

{pid,<0.4312.0>}

{name,menelaus_web_alerts_srv}

,
{mfargs,{menelaus_web_alerts_srv,start_link,[]}},

{restart_type,permanent},
{shutdown,5000},
{child_type,worker}]

[error_logger:error,2012-08-19T20:40:14.856,ns_1@10.3.121.114:error_logger:ale_error_logger_handler:log_msg:76]** Node 'ns_1@10.3.121.112' not responding **
** Removing (timedout) connection **

[ns_server:error,2012-08-19T20:40:56.438,ns_1@10.3.121.114:ns_doctor:ns_doctor:update_status:203]The following buckets became not ready on node 'ns_1@10.3.121.112': ["sasl"], those of them are active []
[error_logger:error,2012-08-19T20:42:34.008,ns_1@10.3.121.114:error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,'ns_vbm_new_sup-sasl'}
Context: child_terminated
Reason: normal
Offender: [{pid,<0.7117.0>},
{name,
{new_child_id,
[171,172,173,174,175,176,177,178,179,180,181,182,
183,184,185,186,187,188,189,190,191,192,193,194,
195,196,197,198,199,200,201,202,203,204,205,206,
207,208,209,210,211,212,213,214,215,216,217,218,
219,220,221,222,223,224,225,226,227,228,229,230,
231,232,233,234,235,236,237,238,239,240,241,242,
243,244,245,246,247,248,249,250,251,252,253,254,
255,256,257,258,259,260,261,262,263,264,265,266,
267,268,269,270,271,272,273,274,275,276,277,278,
279,280,281,282,283,284,285,286,287,288,289,290,
291,292,293,294,295,296,297,298,299,300,301,302,
303,304,305,306,307,308,309,310,311,312,313,314,
315,316,317,318,319,320,321,322,323,324,325,326,
327,328,329,330,331,332,333,334,335,336,337,338,
339,340,341],
'ns_1@10.3.121.112'}},
{mfargs,
{ebucketmigrator_srv,start_link,
[{"10.3.121.112",11209},
{"10.3.121.114",11209},
[{username,"sasl"},
{password,"sasl"},
{vbuckets, [171,172,173,174,175,176,177,178,179,180,181, 182,183,184,185,186,187,188,189,190,191,192, 193,194,195,196,197,198,199,200,201,202,203, 204,205,206,207,208,209,210,211,212,213,214, 215,216,217,218,219,220,221,222,223,224,225, 226,227,228,229,230,231,232,233,234,235,236, 237,238,239,240,241,242,243,244,245,246,247, 248,249,250,251,252,253,254,255,256,257,258, 259,260,261,262,263,264,265,266,267,268,269, 270,271,272,273,274,275,276,277,278,279,280, 281,282,283,284,285,286,287,288,289,290,291, 292,293,294,295,296,297,298,299,300,301,302, 303,304,305,306,307,308,309,310,311,312,313, 314,315,316,317,318,319,320,321,322,323,324, 325,326,327,328,329,330,331,332,333,334,335, 336,337,338,339,340,341]},
{takeover,false},
{suffix,"ns_1@10.3.121.114"}]]}},
{restart_type,permanent}

{shutdown,60000}

{child_type,worker}

]

so, 10.3.121.114 didn't find orchestrator after restarting and didn't get up

error from orchestrator that hangs in pending state:

ns_server:warn,2012-08-19T20:54:16.524,ns_1@10.3.121.112:'capi_ddoc_replication_srv-sasl':cb_generic_replication_srv:handle_info:140]Remote server node

{'capi_ddoc_replication_srv-sasl','ns_1@10.3.121.114'}

process down: noconnection
[error_logger:error,2012-08-19T20:54:16.525,ns_1@10.3.121.112:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_memcached:init/1
pid: <0.655.0>
registered_name: []
exception exit: badmatch,{error,timeout,
[

{mc_client_binary,cmd_binary_vocal_recv,5}

{mc_client_binary,create_bucket,4}

{ns_memcached,ensure_bucket,2}

{ns_memcached,init,1}

{gen_server,init_it,6}

{proc_lib,init_p_do_apply,3}

]}
in function gen_server:init_it/6
ancestors: ['ns_memcached_sup-sasl','single_bucket_sup-sasl',<0.552.0>]
messages: [check_started,check_started,check_started,check_started,
check_started,check_started,check_started,
{'$gen_call',

{<0.462.0>,#Ref<0.0.0.5515>}

,connected},
check_started,check_started,
{'$gen_call',

{<0.731.0>,#Ref<0.0.0.5674>}

,topkeys},
check_started,check_started,check_started,check_started,
check_started,check_started,check_started,check_started,
{'$gen_call',

{<0.462.0>,#Ref<0.0.0.5963>}

,connected},
check_started,check_started,check_started,check_started,
check_started,check_started,check_started,check_started,
check_started,check_started,
{'$gen_call',

{<0.462.0>,#Ref<0.0.0.6549>}

,connected},
check_started,check_started,check_started,check_started,
check_started,check_started,check_started,check_started,
check_started,check_started,
{'$gen_call',

{<0.462.0>,#Ref<0.0.0.6898>}

,connected},
check_started,check_started,check_started,check_started,
check_started,check_started,check_started,check_started,
check_started,check_started,
{'$gen_call',

{<0.462.0>,#Ref<0.0.0.7388>}

,connected},
check_started,check_started,check_started,check_started,
check_started,check_started,check_started,check_started,
check_started,check_started,
{'$gen_call',

{<0.462.0>,#Ref<0.0.0.7797>}

,connected},
check_started,check_started,check_started]
links: <0.60.0>,<0.648.0>,#Port<0.7311>
dictionary: []
trap_exit: true
status: running
heap_size: 75025
stack_size: 24
reductions: 6393
neighbours:

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

logs12.tar.gz
9.13 MB
19/Aug/12 9:41 PM
logs14.tar.gz
8.84 MB
19/Aug/12 9:41 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-6315
#	Subject	Branch	Project	Status	CR	V
21331,1	MB-6315: Redirect stderr and stdout to a file	master	ns_server	Status: ABANDONED	-1	0
21498,2	MB-6315: redirect stdout and stderr of init script to log file	couchbase	voltron	Status: MERGED	+2	+1

Activity

People

Assignee:: Farshid Ghods (Inactive)

Reporter:: Andrei Baranouski

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 19/Aug/12 9:41 PM

Updated:: 12/Oct/12 8:31 AM

Resolved:: 12/Oct/12 7:47 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-6315: Redirect stderr and stdout to a file: Gerrit Review:

MB-6315: redirect stdout and stderr of init script to log file: Gerrit Review:

service fails to start sometimes [was: cluster is broken when reboot all nodes at the same time]

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty