Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
Cheshire-Cat
-
Centos 7 64 bit; CB EE 7.0.0-4721
-
Untriaged
-
Centos 64-bit
-
1
-
No
Description
Steps to Reproduce:
1. Create a 5 node cluster: .137, .138, .139, .140, .142
2. Stop-server on .140 and when the node becomes unresponsive, fail it over, but don't rebalance it out yet.
3. Now stop server on .138, .139 nodes.
Now it appears that we can't get the unresponsive nodes from steps 2 and 3 out of the cluster.
We can't quorum failover .138 and .139 as we have another failed node: .140. So attempts to quorum failover will fail as
Unexpected server error: {error,
|
{aborted,
|
#{failed_peers =>
|
['ns_1@172.23.120.140',
|
* Connection #0 to host 172.23.120.137 left intact
|
'ns_1@172.23.120.138']}}}
|
There should be a way to potentially avoid this situation of cluster getting permanently stuck with this problem.
Attachments
Issue Links
Activity
Field | Original Value | New Value |
---|---|---|
Attachment | Screenshot 2021-03-20 at 7.23.38 AM.png [ 131953 ] |
Assignee | Dave Finlay [ dfinlay ] | Abhijeeth Nuthan [ abhijeeth.nuthan ] |
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
Description |
*Steps to Reproduce:*
1. Create a 5 node cluster: .137, .138, .139, .140, .142 2. Stop-server on .140 and when the node becomes unresponsive, fail it over, but don't rebalance it out yet. 3. Now stop server on .138, .139 nodes. Now it appears that we can't get the unresponsive nodes from steps 2 and 3 out of the cluster. We can't quorum failover .138 and .139 as we have another failed node: .140. So attempts to quorum failover will fail as {noformat} Unexpected server error: {error, {aborted, #{failed_peers => ['ns_1@172.23.120.140', * Connection #0 to host 172.23.120.137 left intact 'ns_1@172.23.120.138']}}}{noformat} There should be a way to potentially avoid this situation of cluster getting permanently stuck with this problem. |
*Steps to Reproduce:*
1. Create a 5 node cluster: .137, .138, .139, .140, .142 2. Stop-server on .140 and when the node becomes unresponsive, fail it over, but don't rebalance it out yet. 3. Now stop server on .138, .139 nodes. Now it appears that we can't get the unresponsive nodes from steps 2 and 3 out of the cluster. We can't quorum failover .138 and .139 as we have another failed node: .140. So attempts to quorum failover will fail as {noformat}Unexpected server error: {error, {aborted, #{failed_peers => ['ns_1@172.23.120.140', * Connection #0 to host 172.23.120.137 left intact 'ns_1@172.23.120.138']}}}{noformat} There should be a way to potentially avoid this situation of cluster getting permanently stuck with this problem. |
Assignee | Abhijeeth Nuthan [ abhijeeth.nuthan ] | Sumedh Basarkod [ sumedh.basarkod ] |
Status | Resolved [ 5 ] | Closed [ 6 ] |
Labels | affects-cc-testing ns_server | affects-cc-testing functional-test ns_server |
Fix Version/s | 7.0.0 [ 17233 ] |
Fix Version/s | Cheshire-Cat [ 15915 ] |