Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
None
-
Security Level: Public
-
None
-
Build# 2.0.0-1421-rel
http://qa.hq.northscale.net/job/centos-failover-tests/48/
Description
Rebalance hangs with exception "replicator_died"
The failure happens with the test on a 5 node cluster(http://qa.hq.northscale.net/job/centos-failover-tests/48/):
./testrunner -i /tmp/failover.ini get-logs=True -t failovertests.FailoverTests.test_failover_normal,replica=2,load_ratio=10
In the diagnostics from the master node 10.1.3.114, the rebalance starts at [2012-07-09 17:46:03]
[user:info] [2012-07-09 17:46:03] [ns_1@10.1.3.114:<0.1951.0>:ns_orchestrator:idle:399] Starting rebalance, KeepNodes = ['ns_1@10.1.3.114','ns_1@10.1.3.118',
'ns_1@10.1.3.116'], EjectNodes = []
and at [2012-07-09 18:02:44] it gets the below exception and crash reports:
[ns_server:error] [2012-07-09 18:02:44] [ns_1@10.1.3.114:<0.28201.4>:ns_replicas_builder:build_replicas_main:109] Got premature exit from one of ebucketmigrators: {'EXIT',<19197.13149.3>,
{badmatch,
[error_logger:error] [2012-07-09 18:02:44] [ns_1@10.1.3.114:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ebucketmigrator_srv:init/1
pid: <19197.13149.3>
registered_name: []
exception error: no match of right hand side value {error,timeout}
in function mc_client_binary:cmd_binary_vocal_recv/5
in call from mc_client_binary:set_vbucket/3
in call from ebucketmigrator_srv:'init/1-lc$^0/1-0'/3
in call from ebucketmigrator_srv:init/1
ancestors: [<0.28201.4>,<0.28200.4>,<0.19639.4>,<0.19596.4>]
messages: []
links: Port<19197.299960>,<0.28201.4>,#Port<19197.299959>
dictionary: []
trap_exit: false
status: running
heap_size: 610
stack_size: 24
reductions: 1757
neighbours:
=========================CRASH REPORT=========================
crasher:
initial call: erlang:apply/2
pid: <0.28201.4>
registered_name: []
exception exit: {replicator_died,
{'EXIT',<19197.13149.3>,{badmatch,{error,timeout}}}}
in function ns_replicas_builder:'build_replicas_main/6-fun-0'/2
in call from ns_replicas_builder:observe_wait_all_done_tail/5
in call from ns_replicas_builder:observe_wait_all_done/5
in call from ns_replicas_builder:'build_replicas_main/6-fun-1'/8
in call from ns_replicas_builder:try_with_maybe_ignorant_after/2
in call from ns_replicas_builder:build_replicas_main/6
ancestors: [<0.28200.4>,<0.19639.4>,<0.19596.4>]
messages: []
links: [<0.28200.4>,<0.28204.4>]
dictionary: []
trap_exit: true
status: running
heap_size: 2584
stack_size: 24
reductions: 241661
Diagnostics are attached. The jenkins cluster is in the same state if required for diagnosis.