Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62056

Not able to add a upgraded 7.6.2 node to a 7.6.0 cluster

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • 7.6.2
    • 7.6.2
    • ns_server
    • None
    • Untriaged
    • 0
    • Unknown

    Description

      Build - 7.6.2-3656

      Steps to repro 

      • Create a cluster kv-kv-index-index-n1ql-n1ql configuration with 7.6.0-2183 installed on all the nodes
      • Create buckets and indexes
      • Start the upgrade process by rebalancing out the node and doing a clean install of 7.6.2 in provisioned mode and adding back the upgraded node into the cluster
      • The following failure is observed 

        membase.api.exception.AddNodeException: Error adding node: 172.23.108.207 to the cluster:172.23.106.15 - b'["Failed to connect to https://172.23.108.207:18091. Could not connect to \\"172.23.108.207\\" on port 18091.  This could be due to an incorrect host/port combination or a firewall in place between the servers."]' 

      Key points 

      • The behaviour was not at all observed when the upgrade was done from 7.2.x and 7.1.x to 7.6.2 using the same exact steps as mentioned above
      • The above failure was not only observed while adding back a upgraded node of a particular service.
      • Adding back of the 7.6.2 node into 7.6.0 cluster was successful when done manually via the UI
      • rest call for the adding back of the node - 

        POST http://172.23.106.15:8091/controller/addNode body: hostname=https%3A%2F%2F172.23.108.207%3A18091&user=Administrator&password=password&services=n1ql headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Accept': '*/*'} error: 400 reason: unknown b'["Failed to connect to https://172.23.108.207:18091. Could not connect to \\"172.23.108.207\\" on port 18091.  This could be due to an incorrect host/port combination or a firewall in place between the servers."]' auth: Administrator:password

      • UI logs dump

      2024-05-24 09:54:51,976] - [on_prem_rest_client:4347] INFO - Latest logs from UI on 172.23.106.15:
      [2024-05-24 09:54:51,976] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.15', 'type': 'info', 'code': 5, 'module': 'ns_cluster', 'tstamp': 1716569691933, 'shortText': 'message', 'text': 'Failed to add node 172.23.108.207:18091 to cluster. Failed to connect to https://172.23.108.207:18091. Could not connect to "172.23.108.207" on port 18091.  This could be due to an incorrect host/port combination or a firewall in place between the servers.', 'serverTime': '2024-05-24T09:54:51.933Z'}
      [2024-05-24 09:54:51,976] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.234', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248431, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.234' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n                                                                                     connection_closed}]", 'serverTime': '2024-05-24T16:47:28.431Z'}
      [2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.208', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248346, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.208' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n                                                                                     connection_closed}]", 'serverTime': '2024-05-24T09:47:28.346Z'}
      [2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.223', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248335, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.223' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n                                                                                     connection_closed}]", 'serverTime': '2024-05-24T09:47:28.335Z'}
      [2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.15', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248301, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.15' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n                                                                                    connection_closed}]", 'serverTime': '2024-05-24T09:47:28.301Z'}
      [2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.121.199', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248147, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.121.199' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n                                                                                     connection_closed}]", 'serverTime': '2024-05-24T09:47:28.147Z'}
      [2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 1, 'module': 'ns_cluster', 'tstamp': 1716569247232, 'shortText': 'message', 'text': "Node 'ns_1@172.23.108.207' is leaving cluster.", 'serverTime': '2024-05-24T09:47:27.232Z'}
      [2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 4, 'module': 'ns_cluster', 'tstamp': 1716569246996, 'shortText': 'message', 'text': 'Node ns_1@172.23.108.207 asked to leave the cluster', 'serverTime': '2024-05-24T09:47:26.996Z'}
      [2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1716569246928, 'shortText': 'message', 'text': 'Rebalance completed successfully.\nRebalance Operation Id = 6e765180e1760048fb7e85e22813919c', 'serverTime': '2024-05-24T09:47:26.928Z'}
      [2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.121.199', 'type': 'info', 'code': 0, 'module': 'auto_failover', 'tstamp': 1716569246573, 'shortText': 'message', 'text': 'Enabled auto-failover with timeout 120 and max count 1', 'serverTime': '2024-05-24T09:47:26.573Z'}
      [2024-05-24 09:54:51,984] - [on_prem_rest_client:4347] INFO - Latest logs from UI on 172.23.108.207:
      [2024-05-24 09:54:51,984] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.15', 'type': 'info', 'code': 5, 'module': 'ns_cluster', 'tstamp': 1716569691933, 'shortText': 'message', 'text': 'Failed to add node 172.23.108.207:18091 to cluster. Failed to connect to https://172.23.108.207:18091. Could not connect to "172.23.108.207" on port 18091.  This could be due to an incorrect host/port combination or a firewall in place between the servers.', 'serverTime': '2024-05-24T09:54:51.933Z'}
      [2024-05-24 09:54:51,984] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.234', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248431, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.234' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n                                                                                     connection_closed}]", 'serverTime': '2024-05-24T16:47:28.431Z'}
      [2024-05-24 09:54:51,984] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.208', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248346, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.208' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n                                                                                     connection_closed}]", 'serverTime': '2024-05-24T09:47:28.346Z'}
      [2024-05-24 09:54:51,984] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.223', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248335, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.223' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n                                                                                     connection_closed}]", 'serverTime': '2024-05-24T09:47:28.335Z'}
      [2024-05-24 09:54:51,984] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.15', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248301, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.15' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n                                                                                    connection_closed}]", 'serverTime': '2024-05-24T09:47:28.301Z'}
      [2024-05-24 09:54:51,985] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.121.199', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248147, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.121.199' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n                                                                                     connection_closed}]", 'serverTime': '2024-05-24T09:47:28.147Z'}
      [2024-05-24 09:54:51,985] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 1, 'module': 'ns_cluster', 'tstamp': 1716569247232, 'shortText': 'message', 'text': "Node 'ns_1@172.23.108.207' is leaving cluster.", 'serverTime': '2024-05-24T09:47:27.232Z'}
      [2024-05-24 09:54:51,985] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 4, 'module': 'ns_cluster', 'tstamp': 1716569246996, 'shortText': 'message', 'text': 'Node ns_1@172.23.108.207 asked to leave the cluster', 'serverTime': '2024-05-24T09:47:26.996Z'}
      [2024-05-24 09:54:51,985] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1716569246928, 'shortText': 'message', 'text': 'Rebalance completed successfully.\nRebalance Operation Id = 6e765180e1760048fb7e85e22813919c', 'serverTime': '2024-05-24T09:47:26.928Z'}
      [2024-05-24 09:54:51,985] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.121.199', 'type': 'info', 'code': 0, 'module': 'auto_failover', 'tstamp': 1716569246573, 'shortText': 'message', 'text': 'Enabled auto-failover with timeout 120 and max count 1', 'serverTime': '2024-05-24T09:47:26.573Z'}
      [2024-05-24 09:54:51,985] - [on_prem_rest_client:1742] ERROR - add_node error : b'["Failed to connect to https://172.23.108.207:18091. Could not connect to \\"172.23.108.207\\" on port 18091.  This could be due to an incorrect host/port combination or a firewall in place between the servers."]'

       

      Logs - 

      test_2 (11).zip

      Attachments

        1. test_2 (11).zip
          45.24 MB
        2. test2_logs.zip
          56.35 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            yash.dodderi Yash Dodderi
            yash.dodderi Yash Dodderi
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty