Details
Description
Cluster ip is 172.23.105.23
1. create 8 nodes cluster, each node has 12G RAM, HHD
2. create 2 buckets default and saslbucket, with memory quota 7G and 5G
3. Run the KV only use case for 1 week:
After 1 week run, loading 150 M items to default bucket, 23% resident, 60M items to sasl bucket, make bucket into 44% resident ratio, the workload for both buckets is15k ops/sec with 5% create, 5% delete, 5%expire, 5% update, 80% gets.
Then in a heavy dgm state, try to rebalance in one node. Fail and do another rebalance, still get the same error:
Rebalance exited with reason {bulk_set_vbucket_state_failed,
[{'ns_1@172.23.105.23',
{'EXIT',
{{{{unexpected_reason,killed},
[
{gen_server,call,
['tap_replication_manager-saslbucket', {change_vbucket_replication,583, undefined},
infinity]}},
{gen_server,call,
[{'janitor_agent-saslbucket', 'ns_1@172.23.105.23'},
{if_rebalance,<0.9405.321>,
{update_vbucket_state,583,replica,
undefined,undefined}},
infinity]}}}}]}
ns_orchestrator002 ns_1@172.23.105.23 16:48:35 - Tue May 28, 2013
<0.15529.323> exited with {bulk_set_vbucket_state_failed,
[{'ns_1@172.23.105.23',
{'EXIT',
{{{{unexpected_reason,killed},
[{misc,executing_on_new_process,1}
,
{tap_replication_manager, change_vbucket_filter,4},
{tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^2/1-2-', 2},
{tap_replication_manager, do_set_incoming_replication_map,3},
{tap_replication_manager,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['tap_replication_manager-saslbucket',
,
infinity]}},
{gen_server,call,
[
,
{if_rebalance,<0.9405.321>,
{update_vbucket_state,583,replica,
undefined,undefined}},
infinity]}}}}]} ns_vbucket_mover000 ns_1@172.23.105.23 16:48:05 - Tue May 28, 2013
Failed to get tap stats after 5 attempts ebucketmigrator_srv000 ns_1@172.23.105.27 15:44:12 - Tue May 28, 2013
Bucket "saslbucket" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.23 15:03:11 - Tue May 28, 2013
Started rebalancing bucket saslbucket ns_rebalancer000 ns_1@172.23.105.23 15:03:10 - Tue May 28, 2013
Starting rebalance, KeepNodes = ['ns_1@172.23.105.23','ns_1@172.23.105.25',
'ns_1@172.23.105.26','ns_1@172.23.105.27',
'ns_1@172.23.105.28','ns_1@172.23.105.29',
'ns_1@172.23.105.30','ns_1@172.23.105.31',
'ns_1@172.23.105.32','ns_1@172.23.105.33'], EjectNodes = []
ns_orchestrator004 ns_1@172.23.105.23 15:03:09 - Tue May 28, 2013
This block the cluster from any topology change.
The link of the diags is https://s3.amazonaws.com/bugdb/jira/MB-8358/172.23.105.23-807.zip