Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11959

[3.0.0-1150] memcached crashed swap rebalance during XDCR

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 3.0
    • 3.0
    • couchbase-bucket
    • Security Level: Public
    • None
    • 3.0.0-1150

    Description

      http://qa.hq.northscale.net/job/centos_x64--107_01--rebalanceXDCR-P1/45/consoleFull

      [Test Error]
      [2014-08-14 00:50:10,495] - [xdcrbasetests:692] INFO - Starting swap-rebalance [remove_node:10.5.2.230] -> [add_node:10.3.5.68] at source cluster 10.5.2.228
      [2014-08-14 00:50:10,495] - [xdcrbasetests:372] INFO - sleep for 5 secs. ...
      [2014-08-14 00:50:11,480] - [task:286] INFO - This is swap rebalance and we will monitor vbuckets shuffling
      [2014-08-14 00:50:11,621] - [task:300] INFO - adding node 10.3.5.68:8091 to cluster
      [2014-08-14 00:50:11,622] - [rest_client:933] INFO - adding remote node @10.3.5.68:8091 to this cluster @10.5.2.228:8091
      [2014-08-14 00:50:22,088] - [rest_client:1095] INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.5.2.230&user=Administrator&knownNodes=ns_1%4010.5.2.230%2Cns_1%4010.5.2.229%2Cns_1%4010.5.2.228%2Cns_1%4010.3.5.68
      [2014-08-14 00:50:22,099] - [rest_client:1099] INFO - rebalance operation started
      [2014-08-14 00:50:22,209] - [rest_client:1216] INFO - rebalance percentage : 0 %
      [2014-08-14 00:50:22,218] - [rest_client:484] INFO - index query url: http://10.5.2.228:8092/default/_design/dev_ddoc1/_view/default0?full_set=true&stale=false
      [2014-08-14 00:50:31,011] - [rest_client:484] INFO - index query url: http://10.5.2.231:8092/default/_design/dev_ddoc1/_view/default0?full_set=true&stale=false
      [2014-08-14 00:50:34,742] - [rest_client:1200] ERROR -

      {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'}

      - rebalance failed
      [2014-08-14 00:50:34,773] - [rest_client:2010] INFO - Latest logs from UI on 10.5.2.228:
      [2014-08-14 00:50:34,774] - [rest_client:2011] ERROR - {u'node': u'ns_1@10.5.2.228', u'code': 2, u'text': u'Rebalance exited with reason {unexpected_exit,\n {\'EXIT\',<0.18085.112>,\n {wait_seqno_persisted_failed,"default",1008,\n 53,\n [{\'ns_1@10.3.5.68\',\n {\'EXIT\',\n badmatch,{error,closed,\n {gen_server,call,\n [

      {\'janitor_agent-default\',\n \'ns_1@10.3.5.68\'}

      ,\n {if_rebalance,<0.17703.112>,\n {wait_seqno_persisted,1008,53}},\n infinity]}}}}]}}}\n', u'shortText': u'message', u'serverTime': u'2014-08-14T00:51:15.143Z', u'module': u'ns_orchestrator', u'tstamp': 1408002675143, u'type': u'info'}
      [2014-08-14 00:50:34,775] - [rest_client:2011] ERROR -

      {u'node': u'ns_1@10.3.5.68', u'code': 0, u'text': u"Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 134. Restarting. Messages: Thu Aug 14 00:51:14.951340 PDT 3: (default) Notified the completion of checkpoint persistence for vbucket 671, id 19, cookie 0x6624f00\nThu Aug 14 00:51:14.953926 PDT 3: (default) Notified the completion of checkpoint persistence for vbucket 670, id 18, cookie 0x6624600\nThu Aug 14 00:51:14.961772 PDT 3: (default) Notified the completion of checkpoint persistence for vbucket 669, id 45, cookie 0x65d7c00\nThu Aug 14 00:51:14.965450 PDT 3: (default) Notified the completion of checkpoint persistence for vbucket 668, id 53, cookie 0x65d7600\nasssertion failed [maxDBSeqno == info.last_sequence] at /buildbot/build_slave/centos-5-x64-300-builder/build/build/ep-engine/src/couch-kvstore/couch-kvstore.cc:1893", u'shortText': u'message', u'serverTime': u'2014-08-14T00:51:15.123Z', u'module': u'ns_log', u'tstamp': 1408002675123, u'type': u'info'}

      [2014-08-14 00:50:34,775] - [rest_client:2011] ERROR - {u'node': u'ns_1@10.5.2.228', u'code': 0, u'text': u'<0.17883.112> exited with {unexpected_exit,\n {\'EXIT\',<0.18085.112>,\n {wait_seqno_persisted_failed,"default",1008,53,\n [{\'ns_1@10.3.5.68\',\n {\'EXIT\',\n badmatch,{error,closed,\n {gen_server,call,\n [

      {\'janitor_agent-default\',\'ns_1@10.3.5.68\'}

      ,\n {if_rebalance,<0.17703.112>,\n {wait_seqno_persisted,1008,53}},\n infinity]}}}}]}}}', u'shortText': u'message', u'serverTime': u'2014-08-14T00:51:15.122Z', u'module': u'ns_vbucket_mover', u'tstamp': 1408002675122, u'type': u'critical'}
      [2014-08-14 00:50:34,776] - [rest_client:2011] ERROR - {u'node': u'ns_1@10.3.5.68', u'code': 0, u'text': u"Control connection to memcached on 'ns_1@10.3.5.68' disconnected: {badmatch,\n {error,\n closed}}", u'shortText': u'message', u'serverTime': u'2014-08-14T00:51:15.122Z', u'module': u'ns_memcached', u'tstamp': 1408002675122, u'type': u'info'}
      [2014-08-14 00:50:34,776] - [rest_client:2011] ERROR -

      {u'node': u'ns_1@10.5.2.228', u'code': 0, u'text': u'Bucket "default" rebalance appears to be swap rebalance', u'shortText': u'message', u'serverTime': u'2014-08-14T00:51:13.582Z', u'module': u'ns_vbucket_mover', u'tstamp': 1408002673582, u'type': u'info'}

      [2014-08-14 00:50:34,777] - [rest_client:2011] ERROR -

      {u'node': u'ns_1@10.3.5.68', u'code': 0, u'text': u'Bucket "default" loaded on node \'ns_1@10.3.5.68\' in 0 seconds.', u'shortText': u'message', u'serverTime': u'2014-08-14T00:51:12.267Z', u'module': u'ns_memcached', u'tstamp': 1408002672267, u'type': u'info'}

      [2014-08-14 00:50:34,777] - [rest_client:2011] ERROR -

      {u'node': u'ns_1@10.5.2.228', u'code': 0, u'text': u'Started rebalancing bucket default', u'shortText': u'message', u'serverTime': u'2014-08-14T00:51:12.145Z', u'module': u'ns_rebalancer', u'tstamp': 1408002672145, u'type': u'info'}

      [2014-08-14 00:50:34,777] - [rest_client:2011] ERROR -

      {u'node': u'ns_1@10.5.2.228', u'code': 4, u'text': u"Starting rebalance, KeepNodes = ['ns_1@10.5.2.229','ns_1@10.5.2.228',\n 'ns_1@10.3.5.68'], EjectNodes = ['ns_1@10.5.2.230'], Failed over and being ejected nodes = []; no delta recovery nodes\n", u'shortText': u'message', u'serverTime': u'2014-08-14T00:51:12.083Z', u'module': u'ns_orchestrator', u'tstamp': 1408002672083, u'type': u'info'}

      [2014-08-14 00:50:34,778] - [rest_client:2011] ERROR -

      {u'node': u'ns_1@10.3.5.68', u'code': 3, u'text': u'Node ns_1@10.3.5.68 joined cluster', u'shortText': u'message', u'serverTime': u'2014-08-14T00:51:12.043Z', u'module': u'ns_cluster', u'tstamp': 1408002672043, u'type': u'info'}

      [2014-08-14 00:50:34,778] - [rest_client:2011] ERROR -

      {u'node': u'ns_1@10.3.5.68', u'code': 1, u'text': u'Couchbase Server has started on web port 8091 on node \'ns_1@10.3.5.68\'. Version: "3.0.0-1150-rel-enterprise".', u'shortText': u'web start ok', u'serverTime': u'2014-08-14T00:51:12.016Z', u'module': u'menelaus_sup', u'tstamp': 1408002672016, u'type': u'info'}

      [('/usr/lib/python2.7/threading.py', 524, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, '__bootstrap_inner', 'self.run()'), ('lib/tasks/taskmanager.py', 31, 'run', 'task.step(self)'), ('lib/tasks/task.py', 58, 'step', 'self.check(task_manager)'), ('lib/tasks/task.py', 370, 'check', 'self.set_exceeption(ex)'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]
      Thu Aug 14 00:50:34 2014

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sangharsh Sangharsh Agarwal
            sangharsh Sangharsh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty