Description
Build - 7.6.2-3656
Steps to repro
- Create a cluster kv-kv-index-index-n1ql-n1ql configuration with 7.6.0-2183 installed on all the nodes
- Create buckets and indexes
- Start the upgrade process by rebalancing out the node and doing a clean install of 7.6.2 in provisioned mode and adding back the upgraded node into the cluster
- The following failure is observed
membase.api.exception.AddNodeException: Error adding node: 172.23.108.207 to the cluster:172.23.106.15 - b'["Failed to connect to https://172.23.108.207:18091. Could not connect to \\"172.23.108.207\\" on port 18091. This could be due to an incorrect host/port combination or a firewall in place between the servers."]'
Key points
- The behaviour was not at all observed when the upgrade was done from 7.2.x and 7.1.x to 7.6.2 using the same exact steps as mentioned above
- The above failure was not only observed while adding back a upgraded node of a particular service.
- Adding back of the 7.6.2 node into 7.6.0 cluster was successful when done manually via the UI
- rest call for the adding back of the node -
POST http://172.23.106.15:8091/controller/addNode body: hostname=https%3A%2F%2F172.23.108.207%3A18091&user=Administrator&password=password&services=n1ql headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Accept': '*/*'} error: 400 reason: unknown b'["Failed to connect to https://172.23.108.207:18091. Could not connect to \\"172.23.108.207\\" on port 18091. This could be due to an incorrect host/port combination or a firewall in place between the servers."]' auth: Administrator:password
- UI logs dump
2024-05-24 09:54:51,976] - [on_prem_rest_client:4347] INFO - Latest logs from UI on 172.23.106.15: |
[2024-05-24 09:54:51,976] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.15', 'type': 'info', 'code': 5, 'module': 'ns_cluster', 'tstamp': 1716569691933, 'shortText': 'message', 'text': 'Failed to add node 172.23.108.207:18091 to cluster. Failed to connect to https://172.23.108.207:18091. Could not connect to "172.23.108.207" on port 18091. This could be due to an incorrect host/port combination or a firewall in place between the servers.', 'serverTime': '2024-05-24T09:54:51.933Z'} |
[2024-05-24 09:54:51,976] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.234', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248431, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.234' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2024-05-24T16:47:28.431Z'} |
[2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.208', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248346, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.208' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2024-05-24T09:47:28.346Z'} |
[2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.223', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248335, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.223' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2024-05-24T09:47:28.335Z'} |
[2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.15', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248301, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.15' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2024-05-24T09:47:28.301Z'} |
[2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.121.199', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248147, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.121.199' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2024-05-24T09:47:28.147Z'} |
[2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 1, 'module': 'ns_cluster', 'tstamp': 1716569247232, 'shortText': 'message', 'text': "Node 'ns_1@172.23.108.207' is leaving cluster.", 'serverTime': '2024-05-24T09:47:27.232Z'} |
[2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 4, 'module': 'ns_cluster', 'tstamp': 1716569246996, 'shortText': 'message', 'text': 'Node ns_1@172.23.108.207 asked to leave the cluster', 'serverTime': '2024-05-24T09:47:26.996Z'} |
[2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1716569246928, 'shortText': 'message', 'text': 'Rebalance completed successfully.\nRebalance Operation Id = 6e765180e1760048fb7e85e22813919c', 'serverTime': '2024-05-24T09:47:26.928Z'} |
[2024-05-24 09:54:51,977] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.121.199', 'type': 'info', 'code': 0, 'module': 'auto_failover', 'tstamp': 1716569246573, 'shortText': 'message', 'text': 'Enabled auto-failover with timeout 120 and max count 1', 'serverTime': '2024-05-24T09:47:26.573Z'} |
[2024-05-24 09:54:51,984] - [on_prem_rest_client:4347] INFO - Latest logs from UI on 172.23.108.207: |
[2024-05-24 09:54:51,984] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.15', 'type': 'info', 'code': 5, 'module': 'ns_cluster', 'tstamp': 1716569691933, 'shortText': 'message', 'text': 'Failed to add node 172.23.108.207:18091 to cluster. Failed to connect to https://172.23.108.207:18091. Could not connect to "172.23.108.207" on port 18091. This could be due to an incorrect host/port combination or a firewall in place between the servers.', 'serverTime': '2024-05-24T09:54:51.933Z'} |
[2024-05-24 09:54:51,984] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.234', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248431, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.234' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2024-05-24T16:47:28.431Z'} |
[2024-05-24 09:54:51,984] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.208', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248346, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.208' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2024-05-24T09:47:28.346Z'} |
[2024-05-24 09:54:51,984] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.223', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248335, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.223' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2024-05-24T09:47:28.335Z'} |
[2024-05-24 09:54:51,984] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.106.15', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248301, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.106.15' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2024-05-24T09:47:28.301Z'} |
[2024-05-24 09:54:51,985] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.121.199', 'type': 'warning', 'code': 5, 'module': 'ns_node_disco', 'tstamp': 1716569248147, 'shortText': 'node down', 'text': "Node 'ns_1@172.23.121.199' saw that node 'ns_1@172.23.108.207' went down. Details: [{nodedown_reason,\n connection_closed}]", 'serverTime': '2024-05-24T09:47:28.147Z'} |
[2024-05-24 09:54:51,985] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 1, 'module': 'ns_cluster', 'tstamp': 1716569247232, 'shortText': 'message', 'text': "Node 'ns_1@172.23.108.207' is leaving cluster.", 'serverTime': '2024-05-24T09:47:27.232Z'} |
[2024-05-24 09:54:51,985] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 4, 'module': 'ns_cluster', 'tstamp': 1716569246996, 'shortText': 'message', 'text': 'Node ns_1@172.23.108.207 asked to leave the cluster', 'serverTime': '2024-05-24T09:47:26.996Z'} |
[2024-05-24 09:54:51,985] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.108.207', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1716569246928, 'shortText': 'message', 'text': 'Rebalance completed successfully.\nRebalance Operation Id = 6e765180e1760048fb7e85e22813919c', 'serverTime': '2024-05-24T09:47:26.928Z'} |
[2024-05-24 09:54:51,985] - [on_prem_rest_client:4348] ERROR - {'node': 'ns_1@172.23.121.199', 'type': 'info', 'code': 0, 'module': 'auto_failover', 'tstamp': 1716569246573, 'shortText': 'message', 'text': 'Enabled auto-failover with timeout 120 and max count 1', 'serverTime': '2024-05-24T09:47:26.573Z'} |
[2024-05-24 09:54:51,985] - [on_prem_rest_client:1742] ERROR - add_node error : b'["Failed to connect to https://172.23.108.207:18091. Could not connect to \\"172.23.108.207\\" on port 18091. This could be due to an incorrect host/port combination or a firewall in place between the servers."]' |
Logs -