Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11308

Node join request timed out (causing upr rebalance tests to fail)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 3.0
    • 3.0
    • ns_server
    • Security Level: Public
    • None
    • Build 3.0.0-767
      Inter cluster replication: UPR
      XDCR: UPR

    Description

      http://qa.hq.northscale.net/job/centos_x64--31_01--uniXDCR-P1/12/consoleFull

      Test case is failed during rebalance in setup itself:

      [2014-06-02 22:51:49,013] - [task:283] INFO - adding node 10.3.121.65:8091 to cluster
      [2014-06-02 22:51:49,014] - [rest_client:930] INFO - adding remote node @10.3.121.65:8091 to this cluster @10.3.4.177:8091
      [2014-06-02 22:52:19,062] - [rest_client:744] ERROR - http://10.3.4.177:8091/controller/addNode error 400 reason: unknown ["Join completion call failed. Timeout connecting to \"10.3.121.65\" on port \"8091\". This could be due to an incorrect host/port combination or a firewall in place between the servers."]
      [2014-06-02 22:52:19,083] - [rest_client:1959] INFO - Latest logs from UI on 10.3.4.177:
      [2014-06-02 22:52:19,084] - [rest_client:1960] ERROR -

      {u'node': u'ns_1@10.3.4.177', u'code': 5, u'text': u'Failed to add node 10.3.121.65:8091 to cluster. Join completion call failed. Timeout connecting to "10.3.121.65" on port "8091". This could be due to an incorrect host/port combination or a firewall in place between the servers.', u'shortText': u'message', u'serverTime': u'2014-06-02T22:53:26.855Z', u'module': u'ns_cluster', u'tstamp': 1401774806855, u'type': u'info'}

      [2014-06-02 22:52:19,085] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 0, u'text': u'Add transaction of \'ns_1@10.3.121.65\' failed because of {error,complete_join,\n <<"Join completion call failed. Timeout connecting to \\"10.3.121.65
      " on port \\"8091
      ". This could be due to an incorrect host/port combination or a firewall in place between the servers.">>,\n {error,rest_error,\n <<"Timeout connecting to \\"10.3.121.65
      " on port \\"8091
      ". This could be due to an incorrect host/port combination or a firewall in place between the servers.">>,\n

      {error,timeout}}}', u'shortText': u'message', u'serverTime': u'2014-06-02T22:53:26.852Z', u'module': u'ns_cluster', u'tstamp': 1401774806852, u'type': u'critical'}
      [2014-06-02 22:52:19,086] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 4, u'text': u"Node 'ns_1@10.3.4.177' saw that node 'ns_1@10.3.121.65' came up. Tags: []", u'shortText': u'node up', u'serverTime': u'2014-06-02T22:52:56.945Z', u'module': u'ns_node_disco', u'tstamp': 1401774776945, u'type': u'info'}
      [2014-06-02 22:52:19,086] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 0, u'text': u"Started node add transaction by adding node 'ns_1@10.3.121.65' to nodes_wanted (group: undefined)", u'shortText': u'message', u'serverTime': u'2014-06-02T22:52:56.843Z', u'module': u'ns_cluster', u'tstamp': 1401774776843, u'type': u'info'}
      [2014-06-02 22:52:19,087] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 0, u'text': u'Change of address to "10.3.4.177" is requested.', u'shortText': u'message', u'serverTime': u'2014-06-02T22:52:56.817Z', u'module': u'ns_cluster', u'tstamp': 1401774776817, u'type': u'info'}
      [2014-06-02 22:52:19,088] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 5, u'text': u"Node 'ns_1@10.3.4.177' saw that node 'ns_1@10.3.121.65' went down. Details: [{nodedown_reason,\n connection_closed}]", u'shortText': u'node down', u'serverTime': u'2014-06-02T22:51:15.540Z', u'module': u'ns_node_disco', u'tstamp': 1401774675540, u'type': u'warning'}
      [2014-06-02 22:52:19,089] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.121.65', u'code': 5, u'text': u"Node 'ns_1@10.3.121.65' saw that node 'ns_1@10.3.121.62' went down. Details: [{nodedown_reason,\n connection_closed}]", u'shortText': u'node down', u'serverTime': u'2014-06-02T22:51:13.288Z', u'module': u'ns_node_disco', u'tstamp': 1401774673288, u'type': u'warning'}
      [2014-06-02 22:52:19,089] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.121.65', u'code': 5, u'text': u"Node 'ns_1@10.3.121.65' saw that node 'ns_1@10.3.2.204' went down. Details: [{nodedown_reason,n connection_closed}]
      ", u'shortText': u'node down', u'serverTime': u'2014-06-02T22:51:13.285Z', u'module': u'ns_node_disco', u'tstamp': 1401774673285, u'type': u'warning'}
      [2014-06-02 22:52:19,090] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 5, u'text': u"Node 'ns_1@10.3.4.177' saw that node 'ns_1@10.3.121.62' went down. Details: [{nodedown_reason,\n connection_closed}]", u'shortText': u'node down', u'serverTime': u'2014-06-02T22:51:13.015Z', u'module': u'ns_node_disco', u'tstamp': 1401774673015, u'type': u'warning'}
      [2014-06-02 22:52:19,090] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 5, u'text': u"Node 'ns_1@10.3.4.177' saw that node 'ns_1@10.3.2.204' went down. Details: [{nodedown_reason,\n connection_closed}]", u'shortText': u'node down', u'serverTime': u'2014-06-02T22:51:13.000Z', u'module': u'ns_node_disco', u'tstamp': 1401774673000, u'type': u'warning'}
      [2014-06-02 22:52:19,107] - [rest_client:1959] INFO - Latest logs from UI on 10.3.121.65:
      [2014-06-02 22:52:19,107] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 5, u'text': u'Failed to add node 10.3.121.65:8091 to cluster. Join completion call failed. Timeout connecting to "10.3.121.65" on port "8091". This could be due to an incorrect host/port combination or a firewall in place between the servers.', u'shortText': u'message', u'serverTime': u'2014-06-02T22:53:26.855Z', u'module': u'ns_cluster', u'tstamp': 1401774806855, u'type': u'info'}
      [2014-06-02 22:52:19,108] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 0, u'text': u'Add transaction of \'ns_1@10.3.121.65\' failed because of {error,complete_join,\n <<"Join completion call failed. Timeout connecting to \\"10.3.121.65
      " on port \\"8091
      ". This could be due to an incorrect host/port combination or a firewall in place between the servers.">>,\n {error,rest_error,\n <<"Timeout connecting to \\"10.3.121.65
      " on port \\"8091
      ". This could be due to an incorrect host/port combination or a firewall in place between the servers.">>,\n {error,timeout}

      }}', u'shortText': u'message', u'serverTime': u'2014-06-02T22:53:26.852Z', u'module': u'ns_cluster', u'tstamp': 1401774806852, u'type': u'critical'}
      [2014-06-02 22:52:19,109] - [rest_client:1960] ERROR -

      {u'node': u'ns_1@10.3.4.177', u'code': 4, u'text': u"Node 'ns_1@10.3.4.177' saw that node 'ns_1@10.3.121.65' came up. Tags: []", u'shortText': u'node up', u'serverTime': u'2014-06-02T22:52:56.945Z', u'module': u'ns_node_disco', u'tstamp': 1401774776945, u'type': u'info'}

      [2014-06-02 22:52:19,110] - [rest_client:1960] ERROR -

      {u'node': u'ns_1@10.3.4.177', u'code': 0, u'text': u"Started node add transaction by adding node 'ns_1@10.3.121.65' to nodes_wanted (group: undefined)", u'shortText': u'message', u'serverTime': u'2014-06-02T22:52:56.843Z', u'module': u'ns_cluster', u'tstamp': 1401774776843, u'type': u'info'}

      [2014-06-02 22:52:19,111] - [rest_client:1960] ERROR -

      {u'node': u'ns_1@10.3.4.177', u'code': 0, u'text': u'Change of address to "10.3.4.177" is requested.', u'shortText': u'message', u'serverTime': u'2014-06-02T22:52:56.817Z', u'module': u'ns_cluster', u'tstamp': 1401774776817, u'type': u'info'}

      [2014-06-02 22:52:19,112] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 5, u'text': u"Node 'ns_1@10.3.4.177' saw that node 'ns_1@10.3.121.65' went down. Details: [

      {nodedown_reason,\n connection_closed}]", u'shortText': u'node down', u'serverTime': u'2014-06-02T22:51:15.540Z', u'module': u'ns_node_disco', u'tstamp': 1401774675540, u'type': u'warning'}
      [2014-06-02 22:52:19,113] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.121.65', u'code': 5, u'text': u"Node 'ns_1@10.3.121.65' saw that node 'ns_1@10.3.121.62' went down. Details: [{nodedown_reason,\n connection_closed}]", u'shortText': u'node down', u'serverTime': u'2014-06-02T22:51:13.288Z', u'module': u'ns_node_disco', u'tstamp': 1401774673288, u'type': u'warning'}
      [2014-06-02 22:52:19,114] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.121.65', u'code': 5, u'text': u"Node 'ns_1@10.3.121.65' saw that node 'ns_1@10.3.2.204' went down. Details: [{nodedown_reason,n connection_closed}

      ]", u'shortText': u'node down', u'serverTime': u'2014-06-02T22:51:13.285Z', u'module': u'ns_node_disco', u'tstamp': 1401774673285, u'type': u'warning'}
      [2014-06-02 22:52:19,114] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 5, u'text': u"Node 'ns_1@10.3.4.177' saw that node 'ns_1@10.3.121.62' went down. Details: [

      {nodedown_reason,\n connection_closed}

      ]", u'shortText': u'node down', u'serverTime': u'2014-06-02T22:51:13.015Z', u'module': u'ns_node_disco', u'tstamp': 1401774673015, u'type': u'warning'}
      [2014-06-02 22:52:19,114] - [rest_client:1960] ERROR - {u'node': u'ns_1@10.3.4.177', u'code': 5, u'text': u"Node 'ns_1@10.3.4.177' saw that node 'ns_1@10.3.2.204' went down. Details: [

      {nodedown_reason,\n connection_closed}

      ]", u'shortText': u'node down', u'serverTime': u'2014-06-02T22:51:13.000Z', u'module': u'ns_node_disco', u'tstamp': 1401774673000, u'type': u'warning'}
      [2014-06-02 22:52:19,114] - [rest_client:968] ERROR - add_node error : ["Join completion call failed. Timeout connecting to \"10.3.121.65\" on port \"8091\". This could be due to an incorrect host/port combination or a firewall in place between the servers."]
      [('/usr/lib/python2.7/threading.py', 524, '_bootstrap', 'self.bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, 'bootstrap_inner', 'self.run()'), ('./testrunner', 262, 'run', '**self._Threadkwargs)'), ('/usr/lib/python2.7/unittest/runner.py', 151, 'run', 'test(result)'), ('/usr/lib/python2.7/unittest/case.py', 391, 'call', 'return self.run(*args, **kwds)'), ('/usr/lib/python2.7/unittest/case.py', 318, 'run', 'self.setUp()'), ('pytests/xdcr/uniXDCR.py', 16, 'setUp', 'super(unidirectional, self).setUp()'), ('pytests/xdcr/xdcrbasetests.py', 100, 'setUp', 'self._init_clusters(self._disabled_consistent_view)'), ('pytests/xdcr/xdcrbasetests.py', 432, '_init_clusters', 'self._setup_cluster(self._clusters_dic[key], disabled_consistent_view)'), ('pytests/xdcr/xdcrbasetests.py', 485, '_setup_cluster', 'self.cluster.async_rebalance(nodes, nodes[1:], []).result()'), ('lib/tasks/future.py', 160, 'result', 'return self.get_result()'), ('lib/tasks/future.py', 111, '_get_result', 'print traceback.extract_stack()')]
      [2014-06-02 22:52:19,117] - [xdcrbasetests:139] ERROR -
      [2014-06-02 22:52:19,118] - [xdcrbasetests:140] ERROR - Error while setting up clusters: (<class 'membase.api.exception.AddNodeException'>, AddNodeException(), <traceback object at 0x7f5ee8119758>)
      [2014-06-02 22:52:19,118] - [xdcrbasetests:175] INFO - ============== XDCRbasetests cleanup is started for test #4 load_with_async_ops ==============
      [('/usr/lib/python2.7/threading.py', 524, '__bootstrap', 'self.__bootstrap_inner()'), ('/usr/lib/python2.7/threading.py', 551, '__bootstrap_inner', 'self.run()'), ('lib/tasks/taskmanager.py', 31, 'run', 'task.step(self)'), ('lib/tasks/task.py', 55, 'step', 'self.execute(task_manager)'), ('lib/tasks/task.py', 278, 'execute', 'self.set_exception(e)'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]
      Mon Jun 2 22:52:19 2014

      No core files are generated on either nodes.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sangharsh Sangharsh Agarwal
            sangharsh Sangharsh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty