Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6814

Rebalance exited with reason {mover_failed,{badmatch,{error,closed}}} during rebalances on empty buckets on both clusters with XDCR.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • 2.0-beta-2
    • 2.0-beta-2
    • ns_server, XDCR
    • Security Level: Public
    • None
    • 2.0-1793
      1024 vbuckets

    Description

      Setup a 1:1 bidirectional XDCR on default empty buckets.
      Add 2 nodes on cluster1, rebalance
      At the same time, add 2 nodes on cluster2, rebalance.

      Rebalance on cluster1 fails with error
      Rebalance exited with reason {mover_failed,{badmatch,

      {error,closed}}}

      Rebalance on cluster2 fails withe error.
      <0.27087.0> exited with {unexpected_exit,
      {'EXIT',<0.27091.0>,
      {{badmatch,
      [{'EXIT',
      badmatch,{error,closed, {gen_server,call, [<20302.5330.0>,had_backfill,30000]}}}]},
      [{ns_single_vbucket_mover, '-wait_backfill_determination/1-fun-1-',1}]}}}


      - Can always repro this.
      - Successive attemps to rebalance with the same setup always fail.


      - Try the same , removing XDCR, rebalance works fine as expected.

      Errors from ns_logs show
      -------------------------------------------------
      memcached<0.402.0>: Wed Oct  3 15:16:12.467248 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      memcached<0.402.0>: Wed Oct  3 15:16:12.467266 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      memcached<0.402.0>: Wed Oct  3 15:16:12.467276 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 170
      memcached<0.402.0>: Wed Oct  3 15:16:12.467582 PDT 3: Schedule cleanup of "eq_tapq:rebalance_169"
      memcached<0.402.0>: Wed Oct  3 15:16:12.468043 PDT 3: TAP (Producer) eq_tapq:rebalance_169 - Clear the tap queues by force
      memcached<0.402.0>: Wed Oct  3 15:16:12.470020 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Backfill is completed with VBuckets 170,
      memcached<0.402.0>: Wed Oct  3 15:16:12.470041 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Sending TAP_OPAQUE with command "close_backfill" and vbucket 170
      memcached<0.402.0>: Wed Oct  3 15:16:12.495085 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - disconnected, keep alive for 300 seconds
      memcached<0.402.0>: Wed Oct  3 15:16:12.497981 PDT 3: TAP (Producer) eq_tapq:replication_building_170_'ns_1@10.3.3.136' - Connection is closed by force.

      [ns_server:error,2012-10-03T15:16:12.754,ns_1@10.3.3.138:ns_doctor:ns_doctor:update_status:204]The following buckets became not ready on node 'ns_1@10.3.3.136': ["saslbucket"], those of them are active []
      [error_logger:error,2012-10-03T15:16:12.860,ns_1@10.3.3.138:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
       crasher:
         initial call: ebucketmigrator_srv:init/1
         pid: <0.31118.0>
         registered_name: []
         exception error: no match of right hand side value {error,closed}

           in function  mc_binary:quick_stats_recv/3
           in call from mc_binary:mass_get_last_closed_checkpoint_loop/5
           in call from mc_binary:mass_get_last_closed_checkpoint/3
           in call from ebucketmigrator_srv:init/1
         ancestors: [<0.31102.0>,<0.24907.0>,<0.24856.0>]
         messages: []
         links: Port<0.19183>,<0.31102.0>,#Port<0.19181>
         dictionary: []
         trap_exit: false
         status: running
         heap_size: 987
         stack_size: 24
         reductions: 108376
       neighbours:

      [rebalance:info,2012-10-03T15:16:12.861,ns_1@10.3.3.138:<0.31092.0>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!

      [ns_server:info,2012-10-03T15:16:12.862,ns_1@10.3.3.138:<0.31105.0>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.3.138': []
      [rebalance:error,2012-10-03T15:16:12.863,ns_1@10.3.3.138:<0.24907.0>:ns_vbucket_mover:handle_info:252]<0.31102.0> exited with {mover_failed,{badmatch,

      {error,closed}}}
      [ns_server:info,2012-10-03T15:16:12.865,ns_1@10.3.3.138:'janitor_agent-saslbucket':janitor_agent:handle_info:671]Undoing temporary vbucket states caused by rebalance
      [error_logger:error,2012-10-03T15:16:12.865,ns_1@10.3.3.138:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
       crasher:
         initial call: ns_single_vbucket_mover:mover/6
         pid: <0.31102.0>
         registered_name: []
         exception exit: {mover_failed,{badmatch,{error,closed}

      }}
           in function  ns_single_vbucket_mover:wait_for_mover/5
           in call from ns_single_vbucket_mover:mover_inner/6
           in call from misc:try_with_maybe_ignorant_after/2
           in call from ns_single_vbucket_mover:mover/6
         ancestors: [<0.24907.0>,<0.24856.0>]
         messages: []
         links: [<0.24907.0>]
         dictionary: [

      {cleanup_list,[<0.31105.0>]}

      ]
         trap_exit: true
         status: running
         heap_size: 75025
         stack_size: 24
         reductions: 12139
       neighbours:

      [ns_server:debug,2012-10-03T15:16:12.865,ns_1@10.3.3.138:<0.24913.0>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

      {ns_node_disco_events,<0.24907.0>}

      exited with reason {mover_failed,
                                                                                           {badmatch,
                                                                                           

      {error,                                                                                        closed}

      }}
      [user:info,2012-10-03T15:16:12.867,ns_1@10.3.3.138:<0.24628.0>:ns_orchestrator:handle_info:311]Rebalance exited with reason {mover_failed,{badmatch,

      {error,closed}

      }}

      Adding logs.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            alkondratenko Aleksey Kondratenko (Inactive)
            ketaki Ketaki Gangal (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty