Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6477

3 buckets, rebalance 3->2 nodes failed under windows

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.0-beta
    • None
    • None
    • Security Level: Public
    • None
    • windows server r2 2008, 3 nodes build 1648

    Description

      memcached died and rebalance failed when rebalancing 3->2 nodes
      3 buckets: 1 default and 2 sasl buckets

      2012-08-29 11:18:39.028 ns_memcached:4:info:message(ns_1@10.1.3.151) - Control connection to memcached on 'ns_1@10.1.3.151' disconnected: {badmatch,
      {error,
      closed}}
      2012-08-29 11:18:39.028 ns_port_server:0:info:message(ns_1@10.1.3.151) - Port server memcached on node 'ns_1@10.1.3.151' exited with status 38. Restarting. Messages: Wed Aug 29 18:17:29.561304 3: TAP (Producer) eq_tapq:replication_building_714_'ns_1@10.1.3.146' - disconnected, keep alive for 300 seconds
      Wed Aug 29 18:17:29.561304 3: TAP (Producer) eq_tapq:replication_building_714_'ns_1@10.1.3.147' - Connection is closed by force.
      Wed Aug 29 18:17:29.561304 3: TAP (Producer) eq_tapq:replication_building_714_'ns_1@10.1.3.146' - Connection is closed by force.
      Wed Aug 29 18:17:30.998804 3: Schedule cleanup of "eq_tapq:anon_1442"
      Wed Aug 29 18:17:30.998804 3: Schedule cleanup of "eq_tapq:anon_1443"
      Wed Aug 29 18:17:30.998804 3: TAP (Producer) eq_tapq:replication_building_714_'ns_1@10.1.3.147' - Clear the tap queues by force
      Wed Aug 29 18:17:30.998804 3: TAP (Producer) eq_tapq:replication_building_714_'ns_1@10.1.3.146' - Clear the tap queues by force
      Wed Aug 29 18:17:33.608179 3: TAP (Producer) eq_tapq:replication_ns_1@10.1.3.146 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0
      Wed Aug 29 18:17:37.936304 3: TAP (Producer) eq_tapq:replication_building_715_'ns_1@10.1.3.147' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:17:37.936304 3: TAP (Producer) eq_tapq:replication_building_715_'ns_1@10.1.3.147' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:17:38.076929 3: TAP (Producer) eq_tapq:replication_building_715_'ns_1@10.1.3.146' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:17:38.076929 3: TAP (Producer) eq_tapq:replication_building_715_'ns_1@10.1.3.146' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:17:38.514429 3: TAP (Producer) eq_tapq:replication_building_715_'ns_1@10.1.3.147' - disconnected, keep alive for 300 seconds
      Wed Aug 29 18:17:38.795679 3: TAP (Producer) eq_tapq:rebalance_715 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:17:38.795679 3: TAP (Producer) eq_tapq:rebalance_715 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:17:38.795679 3: TAP (Producer) eq_tapq:rebalance_715 - Sending TAP_VBUCKET_SET with vbucket 715 and state "pending"
      Wed Aug 29 18:17:38.873804 3: TAP (Producer) eq_tapq:rebalance_715 - VBucket <715> is going dead to complete vbucket takeover.
      Wed Aug 29 18:17:38.873804 3: TAP (Producer) eq_tapq:rebalance_715 - Sending TAP_VBUCKET_SET with vbucket 715 and state "active"
      Wed Aug 29 18:17:38.873804 3: TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_715>
      Wed Aug 29 18:17:38.873804 3: TAP (Producer) eq_tapq:rebalance_715 - disconnected
      Wed Aug 29 18:17:40.748804 3: Schedule cleanup of "eq_tapq:rebalance_715"
      Wed Aug 29 18:17:40.748804 3: TAP (Producer) eq_tapq:rebalance_715 - Clear the tap queues by force
      Wed Aug 29 18:17:47.514429 3: TAP (Producer) eq_tapq:replication_building_715_'ns_1@10.1.3.146' - disconnected, keep alive for 300 seconds
      Wed Aug 29 18:17:47.514429 3: TAP (Producer) eq_tapq:replication_building_715_'ns_1@10.1.3.147' - Connection is closed by force.
      Wed Aug 29 18:17:47.514429 3: TAP (Producer) eq_tapq:replication_building_715_'ns_1@10.1.3.146' - Connection is closed by force.
      Wed Aug 29 18:17:47.998804 3: Schedule cleanup of "eq_tapq:anon_1444"
      Wed Aug 29 18:17:47.998804 3: Schedule cleanup of "eq_tapq:anon_1445"
      Wed Aug 29 18:17:47.998804 3: TAP (Producer) eq_tapq:replication_building_715_'ns_1@10.1.3.147' - Clear the tap queues by force
      Wed Aug 29 18:17:47.998804 3: TAP (Producer) eq_tapq:replication_building_715_'ns_1@10.1.3.146' - Clear the tap queues by force
      Wed Aug 29 18:17:51.998804 3: TAP (Producer) eq_tapq:replication_ns_1@10.1.3.146 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0
      Wed Aug 29 18:17:52.264429 3: TAP (Producer) eq_tapq:replication_building_716_'ns_1@10.1.3.147' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:17:52.264429 3: TAP (Producer) eq_tapq:replication_building_716_'ns_1@10.1.3.147' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:17:52.295679 3: TAP (Producer) eq_tapq:replication_building_716_'ns_1@10.1.3.146' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:17:52.295679 3: TAP (Producer) eq_tapq:replication_building_716_'ns_1@10.1.3.146' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:17:52.748804 3: TAP (Producer) eq_tapq:replication_building_716_'ns_1@10.1.3.147' - disconnected, keep alive for 300 seconds
      Wed Aug 29 18:17:52.905054 3: TAP (Producer) eq_tapq:rebalance_716 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:17:52.905054 3: TAP (Producer) eq_tapq:rebalance_716 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:17:52.905054 3: TAP (Producer) eq_tapq:rebalance_716 - Sending TAP_VBUCKET_SET with vbucket 716 and state "pending"
      Wed Aug 29 18:17:52.905054 3: TAP (Producer) eq_tapq:rebalance_716 - VBucket <716> is going dead to complete vbucket takeover.
      Wed Aug 29 18:17:52.905054 3: TAP (Producer) eq_tapq:rebalance_716 - Sending TAP_VBUCKET_SET with vbucket 716 and state "active"
      Wed Aug 29 18:17:52.920679 3: TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_716>
      Wed Aug 29 18:17:52.920679 3: TAP (Producer) eq_tapq:rebalance_716 - disconnected
      Wed Aug 29 18:17:54.842554 3: Schedule cleanup of "eq_tapq:rebalance_716"
      Wed Aug 29 18:17:54.842554 3: TAP (Producer) eq_tapq:rebalance_716 - Clear the tap queues by force
      Wed Aug 29 18:17:58.248804 3: TAP (Producer) eq_tapq:replication_building_716_'ns_1@10.1.3.146' - disconnected, keep alive for 300 seconds
      Wed Aug 29 18:17:58.248804 3: TAP (Producer) eq_tapq:replication_building_716_'ns_1@10.1.3.147' - Connection is closed by force.
      Wed Aug 29 18:17:58.248804 3: TAP (Producer) eq_tapq:replication_building_716_'ns_1@10.1.3.146' - Connection is closed by force.
      Wed Aug 29 18:17:58.373804 3: Schedule cleanup of "eq_tapq:anon_1446"
      Wed Aug 29 18:17:58.373804 3: Schedule cleanup of "eq_tapq:anon_1447"
      Wed Aug 29 18:17:58.373804 3: TAP (Producer) eq_tapq:replication_building_716_'ns_1@10.1.3.147' - Clear the tap queues by force
      Wed Aug 29 18:17:58.373804 3: TAP (Producer) eq_tapq:replication_building_716_'ns_1@10.1.3.146' - Clear the tap queues by force
      Wed Aug 29 18:17:58.920679 3: TAP (Producer) eq_tapq:replication_ns_1@10.1.3.146 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0
      Wed Aug 29 18:17:59.514429 3: TAP (Producer) eq_tapq:replication_building_717_'ns_1@10.1.3.147' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:17:59.514429 3: TAP (Producer) eq_tapq:replication_building_717_'ns_1@10.1.3.147' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:17:59.514429 3: TAP (Producer) eq_tapq:replication_building_717_'ns_1@10.1.3.146' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:17:59.514429 3: TAP (Producer) eq_tapq:replication_building_717_'ns_1@10.1.3.146' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:17:59.858179 3: TAP (Producer) eq_tapq:replication_building_717_'ns_1@10.1.3.147' - disconnected, keep alive for 300 seconds
      Wed Aug 29 18:17:59.998804 3: TAP (Producer) eq_tapq:rebalance_717 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:17:59.998804 3: TAP (Producer) eq_tapq:rebalance_717 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:17:59.998804 3: TAP (Producer) eq_tapq:rebalance_717 - Sending TAP_VBUCKET_SET with vbucket 717 and state "pending"
      Wed Aug 29 18:17:59.998804 3: TAP (Producer) eq_tapq:rebalance_717 - VBucket <717> is going dead to complete vbucket takeover.
      Wed Aug 29 18:17:59.998804 3: TAP (Producer) eq_tapq:rebalance_717 - Sending TAP_VBUCKET_SET with vbucket 717 and state "active"
      Wed Aug 29 18:17:59.998804 3: TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_717>
      Wed Aug 29 18:17:59.998804 3: TAP (Producer) eq_tapq:rebalance_717 - disconnected
      Wed Aug 29 18:18:01.998804 3: Schedule cleanup of "eq_tapq:rebalance_717"
      Wed Aug 29 18:18:01.998804 3: TAP (Producer) eq_tapq:rebalance_717 - Clear the tap queues by force
      Wed Aug 29 18:18:06.998804 3: TAP (Producer) eq_tapq:replication_building_717_'ns_1@10.1.3.146' - disconnected, keep alive for 300 seconds
      Wed Aug 29 18:18:06.998804 3: TAP (Producer) eq_tapq:replication_building_717_'ns_1@10.1.3.147' - Connection is closed by force.
      Wed Aug 29 18:18:06.998804 3: TAP (Producer) eq_tapq:replication_building_717_'ns_1@10.1.3.146' - Connection is closed by force.
      Wed Aug 29 18:18:07.998804 3: Schedule cleanup of "eq_tapq:anon_1448"
      Wed Aug 29 18:18:07.998804 3: Schedule cleanup of "eq_tapq:anon_1449"
      Wed Aug 29 18:18:07.998804 3: TAP (Producer) eq_tapq:replication_building_717_'ns_1@10.1.3.147' - Clear the tap queues by force
      Wed Aug 29 18:18:07.998804 3: TAP (Producer) eq_tapq:replication_building_717_'ns_1@10.1.3.146' - Clear the tap queues by force
      Wed Aug 29 18:18:17.139429 3: TAP (Producer) eq_tapq:replication_ns_1@10.1.3.146 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0
      Wed Aug 29 18:18:17.514429 3: TAP (Producer) eq_tapq:replication_building_718_'ns_1@10.1.3.147' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:18:17.514429 3: TAP (Producer) eq_tapq:replication_building_718_'ns_1@10.1.3.147' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:18:17.639429 3: TAP (Producer) eq_tapq:replication_building_718_'ns_1@10.1.3.146' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:18:17.639429 3: TAP (Producer) eq_tapq:replication_building_718_'ns_1@10.1.3.146' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:18:18.280054 3: TAP (Producer) eq_tapq:replication_building_718_'ns_1@10.1.3.147' - disconnected, keep alive for 300 seconds
      Wed Aug 29 18:18:18.483179 3: TAP (Producer) eq_tapq:rebalance_718 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:18:18.483179 3: TAP (Producer) eq_tapq:rebalance_718 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:18:18.483179 3: TAP (Producer) eq_tapq:rebalance_718 - Sending TAP_VBUCKET_SET with vbucket 718 and state "pending"
      Wed Aug 29 18:18:18.483179 3: TAP (Producer) eq_tapq:rebalance_718 - VBucket <718> is going dead to complete vbucket takeover.
      Wed Aug 29 18:18:18.483179 3: TAP (Producer) eq_tapq:rebalance_718 - Sending TAP_VBUCKET_SET with vbucket 718 and state "active"
      Wed Aug 29 18:18:18.498804 3: TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_718>
      Wed Aug 29 18:18:18.498804 3: TAP (Producer) eq_tapq:rebalance_718 - disconnected
      Wed Aug 29 18:18:19.998804 3: Schedule cleanup of "eq_tapq:rebalance_718"
      Wed Aug 29 18:18:19.998804 3: TAP (Producer) eq_tapq:rebalance_718 - Clear the tap queues by force
      Wed Aug 29 18:18:25.123804 3: TAP (Producer) eq_tapq:replication_building_718_'ns_1@10.1.3.146' - disconnected, keep alive for 300 seconds
      Wed Aug 29 18:18:25.123804 3: TAP (Producer) eq_tapq:replication_building_718_'ns_1@10.1.3.147' - Connection is closed by force.
      Wed Aug 29 18:18:25.123804 3: TAP (Producer) eq_tapq:replication_building_718_'ns_1@10.1.3.146' - Connection is closed by force.
      Wed Aug 29 18:18:25.998804 3: Schedule cleanup of "eq_tapq:anon_1450"
      Wed Aug 29 18:18:25.998804 3: Schedule cleanup of "eq_tapq:anon_1451"
      Wed Aug 29 18:18:25.998804 3: TAP (Producer) eq_tapq:replication_building_718_'ns_1@10.1.3.147' - Clear the tap queues by force
      Wed Aug 29 18:18:25.998804 3: TAP (Producer) eq_tapq:replication_building_718_'ns_1@10.1.3.146' - Clear the tap queues by force
      Wed Aug 29 18:18:34.858179 3: TAP (Producer) eq_tapq:replication_ns_1@10.1.3.146 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0
      Wed Aug 29 18:18:38.670679 3: TAP (Producer) eq_tapq:replication_building_719_'ns_1@10.1.3.146' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:18:38.670679 3: TAP (Producer) eq_tapq:replication_building_719_'ns_1@10.1.3.146' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      Wed Aug 29 18:18:38.701929 3: TAP (Producer) eq_tapq:replication_building_719_'ns_1@10.1.3.147' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      Wed Aug 29 18:18:38.701929 3: TAP (Producer) eq_tapq:replication_building_719_'ns_1@10.1.3.147' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      2012-08-29 11:18:39.655 ns_orchestrator:2:info:message(ns_1@10.1.3.146) - Rebalance exited with reason {{bulk_set_vbucket_state_failed,
      [{'ns_1@10.1.3.151',
      {'EXIT',
      {killed,
      {gen_server,call,
      [

      {'janitor_agent-saslbucket3', 'ns_1@10.1.3.151'}

      ,
      {if_rebalance,<0.10658.15>,
      {update_vbucket_state,44,replica,
      undefined,undefined}},
      infinity]}}}}]},
      [

      {janitor_agent,bulk_set_vbucket_state,4}

      ,

      {ns_vbucket_mover, update_replication_post_move,3}

      ,

      {ns_vbucket_mover,handle_info,2}

      ,

      {gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jin Jin Lim (Inactive)
            iryna iryna
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty