Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
Cheshire-Cat
-
Centos 7 64 bit; Couchbase EE 7.0.0 build 4291
-
Untriaged
-
Centos 64-bit
-
-
1
-
Unknown
Description
Summary:
Adding node after it was rebalanced-out failed due to error "Prepare join failed. Node is already part of cluster." This could be due to the reason that node did not get cleaned up after it was rebalanced out
Steps to Reproduce and timeline
During the volume test there is a step to (on a node)
induce_firewall -> autofailover -> remove_firewall -> rebalance_out -> add_back_the_node_again
1. Induce firewall on .233
2021-01-24 19:20:34,806 | test | INFO | MainThread | [Collections_autofailover:rebalance_after_autofailover:103] Inducing failure firewall on nodes: [ip:172.23.106.233 port:8091 ssh_username:root] |
2. .233 gets failed-over
2021-01-24 19:21:40,184 | test | ERROR | pool-1-thread-30 | [rest_client:print_UI_logs:2595] {u'code': 0, u'module': u'failover', u'type': u'info', u'node': u'ns_1@172.23.105.175', u'tstamp': 1611544900094L, u'shortText': u'message', u'serverTime': u'2021-01-24T19:21:40.094Z', u'text': u"Starting failing over ['ns_1@172.23.106.233']"} |
3. Remove firewall and rebalance-out
2021-01-24 19:27:28,275 | test | INFO | pool-1-thread-17 | [table_view:display:72] Rebalance Overview
------------------------------------
Nodes | Services | Status |
------------------------------------
172.23.105.175 | kv | Cluster node |
172.23.106.250 | kv | Cluster node |
172.23.106.236 | kv | Cluster node |
172.23.106.251 | kv | Cluster node |
172.23.106.238 | kv | Cluster node |
------------------------------------
2021-01-24 19:27:43,392 | test | INFO | pool-1-thread-17 | [task:check:322] Rebalance - status: none, progress: 100
2021-01-24 19:28:13,433 | test | ERROR | pool-1-thread-17 | [task:check:374] Node 172.23.106.233:8091 was not cleaned after removing from cluster
4. Add back the node
2021-01-24 19:28:13,529 | test | INFO | pool-1-thread-1 | [rest_client:print_UI_logs:2593] Latest logs from UI on 172.23.105.175: |
2021-01-24 19:28:13,529 | test | ERROR | pool-1-thread-1 | [rest_client:print_UI_logs:2595] {u'code': 5, u'module': u'ns_cluster', u'type': u'info', u'node': u'ns_1@172.23.105.175', u'tstamp': 1611545293485L, u'shortText': u'message', u'serverTime': u'2021-01-24T19:28:13.485Z', u'text': u'Failed to add node 172.23.106.233:8091 to cluster. Prepare join failed. Node is already part of cluster.'} |
Observations
Checking the UI on .175, we see that .233 is not a part of the cluster
But checking the UI on .233 we see that .233 is already a part of the cluster
on .175 ns_server.debug.log
[ns_server:warn,2021-01-24T19:28:13.720-08:00,ns_1@172.23.105.175:mb_master<0.3211.0>:mb_master:master:493]Master got candidate heartbeat from node 'ns_1@172.23.106.233' which is not in peers ['ns_1@172.23.105.175', |
'ns_1@172.23.106.236', |
'ns_1@172.23.106.238', |
'ns_1@172.23.106.250', |
'ns_1@172.23.106.251'] |
[ns_server:debug,2021-01-24T19:28:13.721-08:00,ns_1@172.23.105.175:ns_server_monitor<0.719.0>:health_monitor:handle_cast:82]Ignoring heartbeat from an unknown node 'ns_1@172.23.106.233' |
Attachments
Issue Links
- duplicates
-
MB-43899 "Unexpected server error, request logged." upon adding a node to the cluster
- Closed