Details
-
Bug
-
Resolution: Duplicate
-
Major
-
2.1.0
-
Security Level: Public
-
None
-
build 2.0.2-800-rel
-
Centos 64-bit
Description
Cluster ip is 172.23.105.23
1. create 8 nodes cluster, each node has 12G RAM, HHD
2. create 2 buckets default and saslbucket, with memory quota 6G and 4G
3. Run the KV use case for 2 days:
loading 35M items to each bucket, make resident ratio 70%~80%, access the data 15k ops/sec with 5% create, 5% delete, 5%expire, 5% update, 80 gets for several hours.
4. Then with the same work load, run some rebalance operations. When proceed to rebalance in 2 nodes, after several vbucket movement (17 vbucket for saslbucket), rebalance hangs.
When check the log, see a lot of bucket state change between not ready and active:
root@cola-s10305:/opt/couchbase/var/lib/couchbase/logs# tail -f error.1
<<"replication_building_1013_'ns_1@172.23.105.33'">>} took too long: 1196997 us
[ns_server:error,2013-05-16T15:13:17.111,ns_1@172.23.105.32:<0.11769.0>:ns_memcached:verify_report_long_call:294]call
[ns_server:error,2013-05-16T15:19:19.834,ns_1@172.23.105.32:<0.11769.0>:ns_memcached:verify_report_long_call:294]call {get_tap_docs_estimate,1013, <<"replication_building_1013_'ns_1@172.23.105.33'">>} took too long: 725657 us
[ns_server:error,2013-05-16T15:22:18.618,ns_1@172.23.105.32:ns_memcached-saslbucket<0.11752.0>:ns_memcached:handle_info:671]handle_info(ensure_bucket,..) took too long: 661061 us
[ns_server:error,2013-05-16T15:23:35.127,ns_1@172.23.105.32:<0.11704.0>:ns_memcached:verify_report_long_call:294]call {stats,<<>>}
took too long: 568954 us
[ns_server:error,2013-05-16T15:25:59.291,ns_1@172.23.105.32:<0.11705.0>:ns_memcached:verify_report_long_call:294]call topkeys took too long: 1217646 us
[ns_server:error,2013-05-16T15:30:17.553,ns_1@172.23.105.32:ns_doctor<0.9998.0>:ns_doctor:update_status:234]The following buckets became not ready on node 'ns_1@172.23.105.23': ["saslbucket"], those of them are active ["saslbucket"]
[ns_server:error,2013-05-16T15:30:37.540,ns_1@172.23.105.32:ns_doctor<0.9998.0>:ns_doctor:update_status:234]The following buckets became not ready on node 'ns_1@172.23.105.23': ["saslbucket"], those of them are active ["saslbucket"]
diag links:
https://s3.amazonaws.com/bugdb/jira/MB-8303/8nodes_202-800_rebalance_hang_20130516-143321.tgz