Details
Description
Reference Bug : 6284
Copying over relevant comments for this bug.
Issue still persists on build 1649.
Source: 10.1.3.235, 10.1.3.236, 10.3.2.54
Destination: 10.1.3.237, 10.1.3.238, 10.3.2.55
In a 3:3 unidirectional set up, with on going replication, when I tried to rebalance out one of the nodes (10.3.2.55) on the destination cluster, I saw that replication failed:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Rebalance exited with reason {{bulk_set_vbucket_state_failed,
[{'ns_1@10.1.3.238',
{'EXIT',
{{timeout,
{gen_server,call,
['ns_memcached-bucket', {set_vbucket,216,active},
{'janitor_agent-bucket', 'ns_1@10.1.3.238'}
60000]}},
{gen_server,call,
[,
{janitor_agent,bulk_set_vbucket_state,4}
{if_rebalance,<0.11867.32>,
{update_vbucket_state,731,replica,
undefined,'ns_1@10.1.3.237'}},
infinity]}}}}]},
[,
{ns_vbucket_mover, update_replication_post_move,3},
{ns_vbucket_mover,handle_info,2},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
I got the grabdiags of the 6 nodes, and they are available at the following links:
https://s3.amazonaws.com/bugdb/MB-6462/10.1.3.235-8091-diag.txt.gz
https://s3.amazonaws.com/bugdb/MB-6462/10.1.3.236-8091-diag.txt.gz
https://s3.amazonaws.com/bugdb/MB-6462/10.1.3.237-8091-diag.txt.gz
https://s3.amazonaws.com/bugdb/MB-6462/10.1.3.238-8091-diag.txt.gz
https://s3.amazonaws.com/bugdb/MB-6462/10.3.2.54-8091-diag.txt.gz
https://s3.amazonaws.com/bugdb/MB-6462/10.3.2.55-8091-diag.txt.gz