Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: 2.1.0
Affects Version/s: 2.1.0
Component/s: couchbase-bucket, ns_server
Security Level: Public
Labels:
None
Environment:
build 2.0.2-800-rel

Operating System:
Centos 64-bit

Description

Cluster ip is 172.23.105.23
1. create 8 nodes cluster, each node has 12G RAM, HHD
2. create 2 buckets default and saslbucket, with memory quota 6G and 4G
3. Run the KV use case for 2 days:
loading 35M items to each bucket, make resident ratio 70%~80%, access the data 15k ops/sec with 5% create, 5% delete, 5%expire, 5% update, 80 gets for several hours.
4. Then with the same work load, run some rebalance operations. When proceed to rebalance in 2 nodes, after several vbucket movement (17 vbucket for saslbucket), rebalance hangs.

When check the log, see a lot of bucket state change between not ready and active:

root@cola-s10305:/opt/couchbase/var/lib/couchbase/logs# tail -f error.1
<<"replication_building_1013_'ns_1@172.23.105.33'">>} took too long: 1196997 us
[ns_server:error,2013-05-16T15:13:17.111,ns_1@172.23.105.32:<0.11769.0>:ns_memcached:verify_report_long_call:294]call

{stats,<<>>} took too long: 556521 us
[ns_server:error,2013-05-16T15:19:19.834,ns_1@172.23.105.32:<0.11769.0>:ns_memcached:verify_report_long_call:294]call {get_tap_docs_estimate,1013, <<"replication_building_1013_'ns_1@172.23.105.33'">>} took too long: 725657 us
[ns_server:error,2013-05-16T15:22:18.618,ns_1@172.23.105.32:ns_memcached-saslbucket<0.11752.0>:ns_memcached:handle_info:671]handle_info(ensure_bucket,..) took too long: 661061 us
[ns_server:error,2013-05-16T15:23:35.127,ns_1@172.23.105.32:<0.11704.0>:ns_memcached:verify_report_long_call:294]call {stats,<<>>}

took too long: 568954 us
[ns_server:error,2013-05-16T15:25:59.291,ns_1@172.23.105.32:<0.11705.0>:ns_memcached:verify_report_long_call:294]call topkeys took too long: 1217646 us
[ns_server:error,2013-05-16T15:30:17.553,ns_1@172.23.105.32:ns_doctor<0.9998.0>:ns_doctor:update_status:234]The following buckets became not ready on node 'ns_1@172.23.105.23': ["saslbucket"], those of them are active ["saslbucket"]
[ns_server:error,2013-05-16T15:30:37.540,ns_1@172.23.105.32:ns_doctor<0.9998.0>:ns_doctor:update_status:234]The following buckets became not ready on node 'ns_1@172.23.105.23': ["saslbucket"], those of them are active ["saslbucket"]

diag links:
https://s3.amazonaws.com/bugdb/jira/MB-8303/8nodes_202-800_rebalance_hang_20130516-143321.tgz

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Aliaksey Artamonau (Inactive)

Reporter:: Chisheng Hong (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/May/13 4:43 PM

Updated:: 18/Jun/13 9:30 PM

Resolved:: 18/Jun/13 9:30 PM

Gerrit Reviews

There are no open Gerrit changes

[system test] Rebalance in hangs with some initial vbucket movement due to bucket state bouncing between not ready and active

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty