Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Blocker
Fix Version/s: 3.0
Affects Version/s: 2.1.0
Component/s: ns_server
Security Level: Public
Labels:
None
Environment:
build-807-rel

Operating System:
Centos 64-bit

Description

Cluster ip is 172.23.105.23
1. create 8 nodes cluster, each node has 12G RAM, HHD
2. create 2 buckets default and saslbucket, with memory quota 7G and 5G
3. Run the KV only use case for 1 week:
After 1 week run, loading 150 M items to default bucket, 23% resident, 60M items to sasl bucket, make bucket into 44% resident ratio, the workload for both buckets is15k ops/sec with 5% create, 5% delete, 5%expire, 5% update, 80% gets.

Then in a heavy dgm state, try to rebalance in one node. Fail and do another rebalance, still get the same error:

Rebalance exited with reason {bulk_set_vbucket_state_failed,
[{'ns_1@172.23.105.23',
{'EXIT',
{{{{unexpected_reason,killed},
[

{misc,executing_on_new_process,1}, {tap_replication_manager, change_vbucket_filter,4}, {tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^2/1-2-', 2}, {tap_replication_manager, do_set_incoming_replication_map,3}, {tap_replication_manager,handle_call,3}, {gen_server,handle_msg,5}, {proc_lib,init_p_do_apply,3}]},
{gen_server,call,
['tap_replication_manager-saslbucket', {change_vbucket_replication,583, undefined},
infinity]}},
{gen_server,call,
[{'janitor_agent-saslbucket', 'ns_1@172.23.105.23'},
{if_rebalance,<0.9405.321>,
{update_vbucket_state,583,replica,
undefined,undefined}},
infinity]}}}}]}
ns_orchestrator002 ns_1@172.23.105.23 16:48:35 - Tue May 28, 2013
<0.15529.323> exited with {bulk_set_vbucket_state_failed,
[{'ns_1@172.23.105.23',
{'EXIT',
{{{{unexpected_reason,killed},
[{misc,executing_on_new_process,1}

{tap_replication_manager, change_vbucket_filter,4}

{tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^2/1-2-', 2}

{tap_replication_manager, do_set_incoming_replication_map,3}

{tap_replication_manager,handle_call,3}

{gen_server,handle_msg,5}

{proc_lib,init_p_do_apply,3}

]},
{gen_server,call,
['tap_replication_manager-saslbucket',

{change_vbucket_replication,583,undefined}

,
infinity]}},
{gen_server,call,
[

{'janitor_agent-saslbucket', 'ns_1@172.23.105.23'}

,
{if_rebalance,<0.9405.321>,
{update_vbucket_state,583,replica,
undefined,undefined}},
infinity]}}}}]} ns_vbucket_mover000 ns_1@172.23.105.23 16:48:05 - Tue May 28, 2013
Failed to get tap stats after 5 attempts ebucketmigrator_srv000 ns_1@172.23.105.27 15:44:12 - Tue May 28, 2013
Bucket "saslbucket" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@172.23.105.23 15:03:11 - Tue May 28, 2013
Started rebalancing bucket saslbucket ns_rebalancer000 ns_1@172.23.105.23 15:03:10 - Tue May 28, 2013
Starting rebalance, KeepNodes = ['ns_1@172.23.105.23','ns_1@172.23.105.25',
'ns_1@172.23.105.26','ns_1@172.23.105.27',
'ns_1@172.23.105.28','ns_1@172.23.105.29',
'ns_1@172.23.105.30','ns_1@172.23.105.31',
'ns_1@172.23.105.32','ns_1@172.23.105.33'], EjectNodes = []
ns_orchestrator004 ns_1@172.23.105.23 15:03:09 - Tue May 28, 2013

This block the cluster from any topology change.

The link of the diags is https://s3.amazonaws.com/bugdb/jira/MB-8358/172.23.105.23-807.zip

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Aleksey Kondratenko (Inactive)

Reporter:: Chisheng Hong (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 28/May/13 6:25 PM

Updated:: 11/Oct/13 3:46 PM

Resolved:: 28/May/13 6:49 PM

Gerrit Reviews

There are no open Gerrit changes

[system test] Rebalance failed couple of times due to bulk_set_vbucket_state_failed in a heavy dgm cluster

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty