Details
-
Bug
-
Resolution: Incomplete
-
Major
-
2.0
-
Security Level: Public
-
centos 6.2 64 bit build 2.0.0-1832
Description
Cluster information:
- 8 centos 6.2 64bit server with 4 cores CPU
- Each server has 32 GB RAM and 400 GB SSD disk.
- 24.8 GB RAM for couchbase server at each node
- SSD disk format ext4 on /data
- Each server has its own SSD drive, no disk sharing with other server.
- Create cluster with 6 nodes installed couchbase server 2.0.0-1832
- Cluster has 2 buckets, default (12GB) and saslbucket (12GB).
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
- Disable consistent view on cluster.
- Link to manifest of build 1832 http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1832-rel.rpm.manifest.xml
10.6.2.37
10.6.2.38
10.6.2.44
10.6.2.45
10.6.2.42
10.6.2.43
- Load 14 million items to both bucket. Each key has size from 512 bytes to 1024 bytes
- Queries all 4 views from 2 docs
- Mutate 14 million items with key size from 1500 to 1024 bytes
- Swap rebalance, add node 44, 45 and remove node 40, 42
- Rebalance failed due to node 45 down (bug
MB-6638) - Restart couchbase server on node 45 and continue rebalance again. I see error in log page saying "Got error while trying to send close confirmation:
{error,enotconn}"
* Go to diag of node 37, I see more error around error time stamp
[rebalance:error,2012-10-11T10:52:10.090,ns_1@10.6.2.37:<0.27697.630>:ebucketmigrator_srv:confirm_sent_messages:679]Got error while trying to send close confirmation: {error,enotconn}
[error_logger:error,2012-10-11T10:52:10.091,ns_1@10.6.2.37:error_logger:ale_error_logger_handler:log_msg:76]** Generic server <0.27697.630> terminating
-
- Last message in was {tcp,#Port<0.12815191>, <<128,68,0,0,8,0,0,0,0,0,0,12,0,0,0,2,0,0,0,0,0,0, 0,0,0,4,0,1,255,0,0,0,0,0,0,0,128,68,0,0,8,0,0, 0,0,0,0,12,0,0,0,4,0,0,0,0,0,0,0,0,0,4,0,1,255, 0,0,0,0,0,0,2,128,68,0,0,8,0,0,69,0,0,0,12,0,0, 0,6,0,0,0,0,0,0,0,0,0,4,0,1,255,0,0,0,0,0,0,1, 128,68,0,0,8,0,0,70,0,0,0,12,0,0,0,8,0,0,0,0,0, 0,0,0,0,4,0,1,255,0,0,0,0,0,0,1,128,68,0,0,8,0, 0,171,0,0,0,12,0,0,0,10,0,0,0,0,0,0,0,0,0,4,0,1, 255,0,0,0,0,0,0,1,128,68,0,0,8,0,1,120,0,0,0,12, 0,0,0,12,0,0,0,0,0,0,0,0,0,4,0,1,255,0,0,0,0,0, 0,1>>}
- When Server state == {state,#Port<0.12815191>,#Port<0.12815183>,
#Port<0.12815192>,#Port<0.12815184>,
<0.27698.630>,<<>>,<<>>,
{set,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]},
{1349,977930,61618}
{{[],[],[],[],[],[],[],[],[],[],[],[],[],
[],[],[]}}},
-1,false,false,0,
,
not_started,undefined,
<<"replication_ns_1@10.6.2.37">>,<0.27697.630>,
Unknown macro: {had_backfill,false,undefined,[]}}
- Reason for termination ==
- badmatch,{error,closed,
[ {ebucketmigrator_srv,process_upstream,2},
{ebucketmigrator_srv,process_data,3},
{ebucketmigrator_srv,process_data,4},
{ebucketmigrator_srv,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
[error_logger:error,2012-10-11T10:52:10.093,ns_1@10.6.2.37:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ebucketmigrator_srv:init/1
pid: <0.27697.630>
registered_name: []
exception exit: badmatch,{error,closed,
[{ebucketmigrator_srv,process_upstream,2},
{ebucketmigrator_srv,process_data,3},
{ebucketmigrator_srv,process_data,4},
{ebucketmigrator_srv,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
in function gen_server:terminate/6
ancestors: ['ns_vbm_new_sup-saslbucket','single_bucket_sup-saslbucket',
<0.3624.5>]
messages: [{tcp,#Port<0.12815192>, <<129,68,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0, 129,68,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0>>},
{'EXIT',<0.27698.630>,killed}]
links: [#Port<0.12815192>,<0.3664.5>,#Port<0.12815184>,
#Port<0.12815183>]
dictionary: []
trap_exit: true
status: running
heap_size: 4181
stack_size: 24
reductions: 19296
neighbours:
[error_logger:error,2012-10-11T10:52:10.094,ns_1@10.6.2.37:error_logger:ale_error_logger_handler:log_report:72]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,'ns_vbm_new_sup-saslbucket'}
Context: child_terminated
Reason: badmatch,{error,closed,
[{ebucketmigrator_srv,process_upstream,2},
{ebucketmigrator_srv,process_data,3},
{ebucketmigrator_srv,process_data,4}
,
{ebucketmigrator_srv,handle_info,2}
,
{gen_server,handle_msg,5}
,
{proc_lib,init_p_do_apply,3}
]}
{pid,<0.27697.630>}
Offender: [,
{restart_type,temporary}
{name,{new_child_id,"#$%&'()*+,-./0123456789:;<=>?@ABCD",
'ns_1@10.6.2.45'}},
{mfargs,{ebucketmigrator_srv,start_link,undefined}},
,
{shutdown,60000}
,
{child_type,worker}
]
Link to collect info (will add soon)