Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11153

memcached crashed during rebalance

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 3.0
    • 3.0
    • couchbase-bucket
    • Security Level: Public
    • None
    • Build 3.0.0 686

    Description

      Jenkins, test #2
      http://qa.hq.northscale.net/job/ubuntu_x64--01_02--rebalanceXDCR-P0/8/consoleFull

      [Test]
      ./testrunner i ubuntu_x6401_02-rebalanceXDCR-P0.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False -t xdcr.rebalanceXDCR.Rebalance.async_rebalance_in,items=100000,rdirection=unidirection,ctopology=chain,doc-ops=update-delete,rebalance=source-destination,num_rebalance=1,GROUP=P1

      [Test Steps]

      Intra cluster replication: TAP
      XDCR: UPR

      1. Setup CAPI XDCR Src Dest (3 nodes each)
      2. Load 1M items on source.
      3. Add 1 Node Src and 1 Destination. – Rebalance Failed on Source.

      [Source Nodes] - 10.3.3.144, 10.3.3.146, 10.3.3.147
      [Destination Nodes] - 10.3.3.142, 10.3.3.143, 10.3.3.145

      Adding node 10.3.3.148 is failed to Source cluster node 10.3.3.146, UI logs on 10.3.3.146

      [2014-05-16 11:45:59,449] - [xdcrbasetests:652] INFO - Starting rebalance-in nodes:['10.3.3.148'] at source cluster 10.3.3.146
      [2014-05-16 11:45:59,468] - [xdcrbasetests:652] INFO - Starting rebalance-in nodes:['10.3.3.149'] at destination cluster 10.3.3.143
      [2014-05-16 11:46:00,108] - [task:283] INFO - adding node 10.3.3.148:8091 to cluster
      [2014-05-16 11:46:00,108] - [rest_client:930] INFO - adding remote node @10.3.3.148:8091 to this cluster @10.3.3.146:8091
      [2014-05-16 11:46:12,399] - [rest_client:1084] INFO - rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.3.146%2Cns_1%4010.3.3.144%2Cns_1%4010.3.3.147%2Cns_1%4010.3.3.148
      [2014-05-16 11:46:12,412] - [rest_client:1088] INFO - rebalance operation started
      [2014-05-16 11:46:12,413] - [task:283] INFO - adding node 10.3.3.149:8091 to cluster
      [2014-05-16 11:46:12,413] - [rest_client:930] INFO - adding remote node @10.3.3.149:8091 to this cluster @10.3.3.143:8091
      [2014-05-16 11:46:33,808] - [rest_client:1084] INFO - rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.3.145%2Cns_1%4010.3.3.143%2Cns_1%4010.3.3.149%2Cns_1%4010.3.3.142
      [2014-05-16 11:46:33,852] - [rest_client:1088] INFO - rebalance operation started
      [2014-05-16 11:46:33,876] - [rest_client:1189] INFO - rebalance percentage : 6.44421724794 %
      [2014-05-16 11:46:33,886] - [rest_client:1189] INFO - rebalance percentage : 0 %
      [2014-05-16 11:46:43,939] - [rest_client:1189] INFO - rebalance percentage : 10.154801585 %
      [2014-05-16 11:46:43,957] - [rest_client:1189] INFO - rebalance percentage : 1.9516139992 %
      [2014-05-16 11:46:54,021] - [rest_client:1189] INFO - rebalance percentage : 12.8899603981 %
      [2014-05-16 11:46:54,027] - [rest_client:1189] INFO - rebalance percentage : 6.63686090925 %
      [2014-05-16 11:47:04,187] - [rest_client:1189] INFO - rebalance percentage : 15.4301866965 %
      [2014-05-16 11:47:04,192] - [rest_client:1189] INFO - rebalance percentage : 11.3228707704 %
      [2014-05-16 11:47:14,275] - [rest_client:1189] INFO - rebalance percentage : 19.1415339847 %
      [2014-05-16 11:47:14,284] - [rest_client:1189] INFO - rebalance percentage : 15.4233201367 %
      [2014-05-16 11:47:24,300] - [rest_client:1189] INFO - rebalance percentage : 23.4376788167 %
      [2014-05-16 11:47:24,648] - [rest_client:1189] INFO - rebalance percentage : 19.3288369883 %
      [2014-05-16 11:47:34,671] - [rest_client:1189] INFO - rebalance percentage : 27.7338236486 %
      [2014-05-16 11:47:34,684] - [rest_client:1189] INFO - rebalance percentage : 24.0163727517 %
      [2014-05-16 11:47:44,768] - [rest_client:1189] INFO - rebalance percentage : 32.2249009952 %
      [2014-05-16 11:47:44,779] - [rest_client:1189] INFO - rebalance percentage : 29.0930105931 %
      [2014-05-16 11:47:54,789] - [rest_client:1189] INFO - rebalance percentage : 36.7159783417 %
      [2014-05-16 11:47:54,800] - [rest_client:1189] INFO - rebalance percentage : 32.9977644937 %
      [2014-05-16 11:48:04,818] - [rest_client:1173] ERROR -

      {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'}

      - rebalance failed
      [2014-05-16 11:48:04,885] - [rest_client:1943] INFO - Latest logs from UI on 10.3.3.146:
      [2014-05-16 11:48:04,886] - [rest_client:1944] ERROR -

      {u'node': u'ns_1@10.3.3.146', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@10.3.3.146\' in 1 seconds.', u'shortText': u'message', u'serverTime': u'2014-05-16T11:48:01.812Z', u'module': u'ns_memcached', u'tstamp': 1400266081812, u'type': u'info'}

      [2014-05-16 11:48:04,886] - [rest_client:1944] ERROR - {u'node': u'ns_1@10.3.3.146', u'code': 2, u'text': u"Rebalance exited with reason {unexpected_exit,\n {'EXIT',<0.25174.5>,\n {bulk_set_vbucket_state_failed,\n [{'ns_1@10.3.3.146',\n {'EXIT',\n badmatch,{error,closed,\n {gen_server,call,\n [

      {'janitor_agent-default',\n 'ns_1@10.3.3.146'}

      ,\n {if_rebalance,<0.7078.5>,\n {update_vbucket_state,163,replica,\n passive,undefined}},\n infinity]}}}}]}}}\n", u'shortText': u'message', u'serverTime': u'2014-05-16T11:47:55.968Z', u'module': u'ns_orchestrator', u'tstamp': 1400266075968, u'type': u'info'}
      [2014-05-16 11:48:04,886] - [rest_client:1944] ERROR - {u'node': u'ns_1@10.3.3.146', u'code': 0, u'text': u"<0.25169.5> exited with {unexpected_exit,\n {'EXIT',<0.25174.5>,\n {bulk_set_vbucket_state_failed,\n [{'ns_1@10.3.3.146',\n {'EXIT',\n badmatch,{error,closed,\n {gen_server,call,\n [

      {'janitor_agent-default','ns_1@10.3.3.146'}

      ,\n {if_rebalance,<0.7078.5>,\n {update_vbucket_state,163,replica,passive,\n undefined}},\n infinity]}}}}]}}}", u'shortText': u'message', u'serverTime': u'2014-05-16T11:47:55.919Z', u'module': u'ns_vbucket_mover', u'tstamp': 1400266075919, u'type': u'critical'}
      [2014-05-16 11:48:04,887] - [rest_client:1944] ERROR - {u'node': u'ns_1@10.3.3.146', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@10.3.3.146' disconnected: {badmatch,\n {error,\n closed}}", u'shortText': u'message', u'serverTime': u'2014-05-16T11:47:55.763Z', u'module': u'ns_memcached', u'tstamp': 1400266075763, u'type': u'info'}
      [2014-05-16 11:48:04,887] - [rest_client:1944] ERROR -

      {u'node': u'ns_1@10.3.3.146', u'code': 0, u'text': u'Port server memcached on node \'babysitter_of_ns_1@127.0.0.1\' exited with status 139. Restarting. Messages: Fri May 16 11:47:54.565297 PDT 3: (default) Clean up "eq_tapq:anon_268"\nFri May 16 11:47:54.565322 PDT 3: (default) Clean up "eq_tapq:anon_269"\nFri May 16 11:47:54.565329 PDT 3: (default) Clean up "eq_tapq:anon_270"\nFri May 16 11:47:54.565334 PDT 3: (default) Clean up "eq_tapq:anon_271"\nFri May 16 11:47:54.741452 PDT 3: (default) UPR (Notifier) eq_uprq:ns_server:xdcr:ns_1@10.3.3.146:default - (vb 166) stream created with start seqno 94 and end seqno 0', u'shortText': u'message', u'serverTime': u'2014-05-16T11:47:55.750Z', u'module': u'ns_log', u'tstamp': 1400266075750, u'type': u'info'}

      [2014-05-16 11:48:04,888] - [rest_client:1944] ERROR -

      {u'node': u'ns_1@10.3.3.146', u'code': 0, u'text': u'Bucket "default" rebalance does not seem to be swap rebalance', u'shortText': u'message', u'serverTime': u'2014-05-16T11:46:14.418Z', u'module': u'ns_vbucket_mover', u'tstamp': 1400265974418, u'type': u'info'}

      [2014-05-16 11:48:04,888] - [rest_client:1944] ERROR -

      {u'node': u'ns_1@10.3.3.148', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@10.3.3.148\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-05-16T11:46:13.777Z', u'module': u'ns_memcached', u'tstamp': 1400265973777, u'type': u'info'}

      [2014-05-16 11:48:04,889] - [rest_client:1944] ERROR -

      {u'node': u'ns_1@10.3.3.146', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'serverTime': u'2014-05-16T11:46:12.580Z', u'module': u'ns_rebalancer', u'tstamp': 1400265972580, u'type': u'info'}

      [2014-05-16 11:48:04,889] - [rest_client:1944] ERROR -

      {u'node': u'ns_1@10.3.3.146', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@10.3.3.146','ns_1@10.3.3.144',\n 'ns_1@10.3.3.147','ns_1@10.3.3.148'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes\n", u'shortText': u'message', u'serverTime': u'2014-05-16T11:46:12.345Z', u'module': u'ns_orchestrator', u'tstamp': 1400265972345, u'type': u'info'}

      [2014-05-16 11:48:04,890] - [rest_client:1944] ERROR -

      {u'node': u'ns_1@10.3.3.148', u'code': 3, u'text': u'Node ns_1@10.3.3.148 joined cluster', u'shortText': u'message', u'serverTime': u'2014-05-16T11:46:12.297Z', u'module': u'ns_cluster', u'tstamp': 1400265972297, u'type': u'info'}

      ERROR

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sangharsh Sangharsh Agarwal
            sangharsh Sangharsh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty