Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-10420

Linux: Control connection to memcached on 'ns_1@IP' disconnected during rebalance( that failed)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 3.0
    • 3.0
    • couchbase-bucket
    • Security Level: Public
    • None
    • 3.0.0-426-rel
    • Triaged
    • Centos 64-bit
    • Yes

    Description

      I see the similar bugs MB-10313, MB-10412 but they are related to windows with failover and on 2.5.1
      in my case only rebalance with data ops against 3.0.0:

      http://qa.hq.northscale.net/job/centos_x64--54_00--all_cases_rebalance-P2/4/consoleFull

      [2014-03-10 04:15:29,231] - [rest_client:1828] INFO - Latest logs from UI:
      [2014-03-10 04:15:29,232] - [rest_client:1829] ERROR - {u'node': u'ns_1@10.1.4.1', u'code': 2, u'text': u"Rebalance exited with reason {{unexpected_reason,\n {{badmatch,{error,closed,\n [

      {mc_client_binary,cmd_vocal_recv,5}

      ,\n

      {mc_client_binary,change_vbucket_filter,3}

      ,\n

      {ebucketmigrator_srv,handle_call,3}

      ,\n

      {gen_server,handle_msg,5},\n {proc_lib,init_p_do_apply,3}]}},\n [{misc,executing_on_new_process,1},\n {ns_vbm_sup,perform_vbucket_filter_change,6},\n {tap_replication_manager,\n do_change_vbucket_filter,4},\n {tap_replication_manager,handle_call,3},\n {gen_server,handle_msg,5}

      ,\n

      {proc_lib,init_p_do_apply,3}

      ]},\n {gen_server,call,\n ['replication_manager-default',\n

      {change_vbucket_replication,324,undefined}

      ,\n infinity]}},\n {gen_server,call,\n [

      {'janitor_agent-default','ns_1@10.1.4.2'}

      ,\n

      {get_tap_docs_estimate_many_taps,669,\n [<<>>,<<>>,<<>>]}

      ,\n infinity]}}\n", u'shortText': u'message', u'serverTime': u'2014-03-10T04:16:35.760Z', u'module': u'ns_orchestrator', u'tstamp': 1394450195760, u'type': u'info'}
      [2014-03-10 04:15:29,233] - [rest_client:1829] ERROR -

      {u'node': u'ns_1@10.1.4.1', u'code': 0, u'text': u'Port server memcached on node \'babysitter_of_ns_1@127.0.0.1\' exited with status 139. Restarting. Messages: Mon Mar 10 04:16:29.320586 PDT 3: (default) TAP (Producer) eq_tapq:replication_building_270_\'ns_1@10.1.4.5\' - disconnected, keep alive for 300 seconds\nMon Mar 10 04:16:29.325875 PDT 3: (default) TAP (Producer) eq_tapq:replication_building_270_\'ns_1@10.1.4.3\' - disconnected, keep alive for 300 seconds\nMon Mar 10 04:16:29.334098 PDT 3: (default) TAP (Producer) eq_tapq:replication_building_270_\'ns_1@10.1.4.5\' - Connection is closed by force\nMon Mar 10 04:16:29.334361 PDT 3: (default) TAP (Producer) eq_tapq:replication_building_270_\'ns_1@10.1.4.3\' - Connection is closed by force\nMon Mar 10 04:16:29.348925 PDT 3: (default) TAP (Producer) eq_tapq:replication_ns_1@10.1.4.3 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0', u'shortText': u'message', u'serverTime': u'2014-03-10T04:16:35.758Z', u'module': u'ns_log', u'tstamp': 1394450195758, u'type': u'info'}

      [2014-03-10 04:15:29,234] - [rest_client:1829] ERROR - {u'node': u'ns_1@10.1.4.1', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@10.1.4.1' disconnected: badmatch,\n {error,\n closed,\n [

      {mc_client_binary,\n cmd_vocal_recv,\n 5}

      ,\n

      {mc_client_binary,\n select_bucket,\n 2}

      ,\n

      {ns_memcached,\n ensure_bucket,\n 2}

      ,\n

      {ns_memcached,\n handle_info,\n 2}

      ,\n

      {gen_server,\n handle_msg,\n 5}

      ,\n

      {ns_memcached,\n init,1}

      ,\n

      {gen_server,\n init_it,6}

      ,\n

      {proc_lib,\n init_p_do_apply,\n 3}

      ]}", u'shortText': u'message', u'serverTime': u'2014-03-10T04:16:34.449Z', u'module': u'ns_memcached', u'tstamp': 1394450194449, u'type': u'info'}
      [2014-03-10 04:15:29,235] - [rest_client:1829] ERROR -

      {u'node': u'ns_1@10.1.4.1', u'code': 0, u'text': u'Bucket "default" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'serverTime': u'2014-03-10T04:15:52.397Z', u'module': u'ns_vbucket_mover', u'tstamp': 1394450152397, u'type': u'info'}

      [2014-03-10 04:15:29,236] - [rest_client:1829] ERROR -

      {u'node': u'ns_1@10.1.4.5', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@10.1.4.5\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-03-10T04:15:49.932Z', u'module': u'ns_memcached', u'tstamp': 1394450149932, u'type': u'info'}

      [2014-03-10 04:15:29,236] - [rest_client:1829] ERROR -

      {u'node': u'ns_1@10.1.4.4', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@10.1.4.4\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-03-10T04:15:49.383Z', u'module': u'ns_memcached', u'tstamp': 1394450149383, u'type': u'info'}

      [2014-03-10 04:15:29,237] - [rest_client:1829] ERROR -

      {u'node': u'ns_1@10.1.4.1', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'serverTime': u'2014-03-10T04:15:48.706Z', u'module': u'ns_rebalancer', u'tstamp': 1394450148706, u'type': u'info'}

      [2014-03-10 04:15:29,239] - [rest_client:1829] ERROR -

      {u'node': u'ns_1@10.1.4.1', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@10.1.4.1','ns_1@10.1.4.2',\n 'ns_1@10.1.4.3','ns_1@10.1.4.4',\n 'ns_1@10.1.4.5'], EjectNodes = []\n", u'shortText': u'message', u'serverTime': u'2014-03-10T04:15:48.486Z', u'module': u'ns_orchestrator', u'tstamp': 1394450148486, u'type': u'info'}

      [2014-03-10 04:15:29,240] - [rest_client:1829] ERROR -

      {u'node': u'ns_1@10.1.4.5', u'code': 3, u'text': u'Node ns_1@10.1.4.5 joined cluster', u'shortText': u'message', u'serverTime': u'2014-03-10T04:15:48.424Z', u'module': u'ns_cluster', u'tstamp': 1394450148424, u'type': u'info'}

      [2014-03-10 04:15:29,240] - [rest_client:1829] ERROR -

      {u'node': u'ns_1@10.1.4.5', u'code': 1, u'text': u"Couchbase Server has started on web port 8091 on node 'ns_1@10.1.4.5'.", u'shortText': u'web start ok', u'serverTime': u'2014-03-10T04:15:48.362Z', u'module': u'menelaus_sup', u'tstamp': 1394450148362, u'type': u'info'}

      ERROR

      so, there is the same stacktrace as in MB-10412:
      Control connection to memcached on 'ns_1@10.1.4.1' disconnected: {{badmatch,
      {error,
      closed}},
      [

      {mc_client_binary, cmd_vocal_recv, 5}

      ,

      {mc_client_binary, select_bucket, 2}

      ,

      {ns_memcached, ensure_bucket, 2}

      ,

      {ns_memcached, handle_info, 2}

      ,

      {gen_server, handle_msg, 5}

      ,

      {ns_memcached, init,1}

      ,

      {gen_server, init_it,6}

      ,

      {proc_lib, init_p_do_apply, 3}

      ]}

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            andreibaranouski Andrei Baranouski
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty