Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: 2.5.0
Affects Version/s: 2.5.0
Component/s: couchbase-bucket
Security Level: Public
Labels:
- 2.5.0-dp1
- system-test
Environment:
windows 2008 R2 64-bit, centos 6.2 64-bit

Operating System:
Windows 64-bit

Description

Environment:
9 windows server 2008 R2 64-bit (each server has 8 GB RAM, SSD storage)
10.3.4.127
10.3.4.132
10.3.4.133
10.3.4.134
10.3.4.135
10.3.4.136
10.3.4.137
10.3.4.138
10.3.4.139

Cluster setup:
7 nodes cluster installed couchbase server 2.5.0-915
10.3.4.127
10.3.4.132
10.3.4.133
10.3.4.134
10.3.4.135
10.3.4.136
10.3.4.137

2 buckets:
sasl-1 with 1 replica, 1 doc with 1 public view
sasl-2 with 2 replica, 1 doc with 1 public view
Test step:
Load 7 M items with size from 128 bytes to 512 bytes into each bucket. Then continue load items
into bucket until active resident ratio down to 80%.
Change load set up so that the load has set, get, delete, update, and expired in few hours.
Then while the load is running, add node 138 to cluster and rebalance. About 30 minutes after rebalance started, rebalance second bucket (sasl-1) existed with error

Haven't heard from a higher priority node or a master, so I'm taking over. mb_master000 ns_1@10.3.4.132 23:33:52 - Mon Nov 18, 2013
Rebalance exited with reason {{{badmatch,[

{<0.6316.374>,noproc}]},
[{misc,sync_shutdown_many_i_am_trapping_exits, 1}, {misc,try_with_maybe_ignorant_after,2}, {gen_server,terminate,6}, {proc_lib,init_p_do_apply,3}]},
{gen_server,call,
[<0.6315.374>, {shutdown_replicator,'ns_1@10.3.4.127'},
infinity]}}
ns_orchestrator002 ns_1@10.3.4.127 20:37:37 - Mon Nov 18, 2013
<0.6297.374> exited with {{{badmatch,[{<0.6316.374>,noproc}

]},
[

{misc,sync_shutdown_many_i_am_trapping_exits,1}

{misc,try_with_maybe_ignorant_after,2}

{gen_server,terminate,6}

{proc_lib,init_p_do_apply,3}

]},
{gen_server,call,
[<0.6315.374>,

{shutdown_replicator,'ns_1@10.3.4.127'}

,
infinity]}} ns_vbucket_mover000 ns_1@10.3.4.127 20:37:37 - Mon Nov 18, 2013
Bucket "sasl-1" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@10.3.4.127 20:32:02 - Mon Nov 18, 2013
Bucket "sasl-1" loaded on node 'ns_1@10.3.4.138' in 0 seconds. ns_memcached001 ns_1@10.3.4.138 20:31:58 - Mon Nov 18, 2013
Started rebalancing bucket sasl-1 ns_rebalancer000 ns_1@10.3.4.127 20:31:58 - Mon Nov 18, 2013
Bucket "sasl-2" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@10.3.4.127 20:09:10 - Mon Nov 18, 2013
Bucket "sasl-2" loaded on node 'ns_1@10.3.4.138' in 0 seconds. ns_memcached001 ns_1@10.3.4.138 20:09:06 - Mon Nov 18, 2013
Started rebalancing bucket sasl-2 ns_rebalancer000 ns_1@10.3.4.127 20:09:05 - Mon Nov 18, 2013
Starting rebalance, KeepNodes = ['ns_1@10.3.4.134','ns_1@10.3.4.136',
'ns_1@10.3.4.132','ns_1@10.3.4.137',
'ns_1@10.3.4.133','ns_1@10.3.4.127',
'ns_1@10.3.4.135','ns_1@10.3.4.138'], EjectNodes = []
ns_orchestrator004 ns_1@10.3.4.127 20:09:04 - Mon Nov 18, 2013

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

10.3.4.127-11192013-1046-diag.zip
18.59 MB
19/Nov/13 11:32 AM
10.3.4.132-11192013-1048-diag.zip
16.79 MB
19/Nov/13 11:32 AM
10.3.4.133-11192013-1049-diag.zip
16.77 MB
19/Nov/13 11:32 AM
10.3.4.134-11192013-1051-diag.zip
16.69 MB
19/Nov/13 11:32 AM
10.3.4.135-11192013-1053-diag.zip
16.64 MB
19/Nov/13 11:32 AM
10.3.4.136-11192013-1055-diag.zip
16.75 MB
19/Nov/13 11:32 AM
10.3.4.137-11192013-1056-diag.zip
16.68 MB
19/Nov/13 11:32 AM
10.3.4.138-11192013-1058-diag.zip
14.85 MB
19/Nov/13 11:32 AM

Issue Links

relates to

MB-7739 windows] memcached connection is lost and rebalance failed with reason {{bulk_set_vbucket_state_failed

Closed

MB-7943 Memcached drops tap connections without warning

Closed

MB-9390 Tap producer closes the connection because it didn't receive any ACK for 10K TAP messages sent to the consumer

Closed

MB-9070 [system test] [windows] rebalance failed due to bad replica

Closed

MB-9639 [system test] rebalance failed with error badmatch,{error,etimedout

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-9596
#	Subject	Branch	Project	Status	CR	V
30639,1	MB-9596: improved rebalance diagnostics	for-rackaware	ns_server	Status: MERGED	+2	+1
30641,1	Merge remote-tracking branch 'origin/for-rackaware'	master	ns_server	Status: MERGED	+2	+1

Activity

People

Assignee:: Thuan Nguyen

Reporter:: Thuan Nguyen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 19/Nov/13 11:32 AM

Updated:: 03/Dec/13 4:16 PM

Resolved:: 03/Dec/13 4:16 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-9596: improved rebalance diagnostics: Gerrit Review:

Merge remote-tracking branch 'origin/for-rackaware': Gerrit Review:

[system test] rebalance failed with error {{{badmatch,[{<0.6316.374>,noproc}]}, [{misc,sync_shutdown_many_i_am_trapping_exits, 1},

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty