Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
2.0-beta-2
-
Security Level: Public
-
None
-
2.0-1793
1024 vbuckets
Description
Setup a 1:1 bidirectional XDCR on default empty buckets.
Add 2 nodes on cluster1, rebalance
At the same time, add 2 nodes on cluster2, rebalance.
Rebalance on cluster1 fails with error
Rebalance exited with reason {mover_failed,{badmatch,
Rebalance on cluster2 fails withe error.
<0.27087.0> exited with {unexpected_exit,
{'EXIT',<0.27091.0>,
{{badmatch,
[{'EXIT',
badmatch,{error,closed, {gen_server,call, [<20302.5330.0>,had_backfill,30000]}}}]},
[{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-',1}]}}}
- Can always repro this.
- Successive attemps to rebalance with the same setup always fail.
- Try the same , removing XDCR, rebalance works fine as expected.
Errors from ns_logs show
-------------------------------------------------
memcached<0.402.0>: Wed Oct 3 15:16:12.467248 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
memcached<0.402.0>: Wed Oct 3 15:16:12.467266 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
memcached<0.402.0>: Wed Oct 3 15:16:12.467276 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 170
memcached<0.402.0>: Wed Oct 3 15:16:12.467582 PDT 3: Schedule cleanup of "eq_tapq:rebalance_169"
memcached<0.402.0>: Wed Oct 3 15:16:12.468043 PDT 3: TAP (Producer) eq_tapq:rebalance_169 - Clear the tap queues by force
memcached<0.402.0>: Wed Oct 3 15:16:12.470020 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Backfill is completed with VBuckets 170,
memcached<0.402.0>: Wed Oct 3 15:16:12.470041 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 170
memcached<0.402.0>: Wed Oct 3 15:16:12.495085 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - disconnected, keep alive for 300 seconds
memcached<0.402.0>: Wed Oct 3 15:16:12.497981 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Connection is closed by force.
[ns_server:error,2012-10-03T15:16:12.754,ns_1@10.3.3.138:ns_doctor:ns_doctor:update_status:204]The following buckets became not ready on node 'ns_1@10.3.3.136': ["saslbucket"], those of them are active []
[error_logger:error,2012-10-03T15:16:12.860,ns_1@10.3.3.138:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ebucketmigrator_srv:init/1
pid: <0.31118.0>
registered_name: []
exception error: no match of right hand side value {error,closed}
in function mc_binary:quick_stats_recv/3
in call from mc_binary:mass_get_last_closed_checkpoint_loop/5
in call from mc_binary:mass_get_last_closed_checkpoint/3
in call from ebucketmigrator_srv:init/1
ancestors: [<0.31102.0>,<0.24907.0>,<0.24856.0>]
messages: []
links: Port<0.19183>,<0.31102.0>,#Port<0.19181>
dictionary: []
trap_exit: false
status: running
heap_size: 987
stack_size: 24
reductions: 108376
neighbours:
[rebalance:info,2012-10-03T15:16:12.861,ns_1@10.3.3.138:<0.31092.0>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!
[ns_server:info,2012-10-03T15:16:12.862,ns_1@10.3.3.138:<0.31105.0>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.3.138': []
[rebalance:error,2012-10-03T15:16:12.863,ns_1@10.3.3.138:<0.24907.0>:ns_vbucket_mover:handle_info:252]<0.31102.0> exited with {mover_failed,{badmatch,
[ns_server:info,2012-10-03T15:16:12.865,ns_1@10.3.3.138:'janitor_agent-saslbucket':janitor_agent:handle_info:671]Undoing temporary vbucket states caused by rebalance
[error_logger:error,2012-10-03T15:16:12.865,ns_1@10.3.3.138:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_single_vbucket_mover:mover/6
pid: <0.31102.0>
registered_name: []
exception exit: {mover_failed,{badmatch,{error,closed}
}}
in function ns_single_vbucket_mover:wait_for_mover/5
in call from ns_single_vbucket_mover:mover_inner/6
in call from misc:try_with_maybe_ignorant_after/2
in call from ns_single_vbucket_mover:mover/6
ancestors: [<0.24907.0>,<0.24856.0>]
messages: []
links: [<0.24907.0>]
dictionary: [
]
trap_exit: true
status: running
heap_size: 75025
stack_size: 24
reductions: 12139
neighbours:
[ns_server:debug,2012-10-03T15:16:12.865,ns_1@10.3.3.138:<0.24913.0>:ns_pubsub:do_subscribe_link:134]Parent process of subscription
{ns_node_disco_events,<0.24907.0>} exited with reason {mover_failed,
{badmatch,
}}
[user:info,2012-10-03T15:16:12.867,ns_1@10.3.3.138:<0.24628.0>:ns_orchestrator:handle_info:311]Rebalance exited with reason {mover_failed,{badmatch,
}}
Adding logs.