Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: ns_server
Labels:
Environment:
Centos 7 64 bit; CB EE 7.0.0-4721

Triage:
Untriaged
Operating System:
Centos 64-bit
Story Points:
1
Is this a Regression?:
No

Description

Steps to Reproduce:
1. Create a 5 node cluster: .137, .138, .139, .140, .142
2. Stop-server on .140 and when the node becomes unresponsive, fail it over, but don't rebalance it out yet.

3. Now stop server on .138, .139 nodes.

Now it appears that we can't get the unresponsive nodes from steps 2 and 3 out of the cluster.
We can't quorum failover .138 and .139 as we have another failed node: .140. So attempts to quorum failover will fail as

Unexpected server error: {error,

                             {aborted,

                                 #{failed_peers =>

                                       ['ns_1@172.23.120.140',

* Connection #0 to host 172.23.120.137 left intact

                                        'ns_1@172.23.120.138']}}}

There should be a way to potentially avoid this situation of cluster getting permanently stuck with this problem.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Screenshot 2021-03-20 at 7.23.38 AM.png
338 kB
19/Mar/21 6:54 PM

Issue Links

blocks

MB-45433 UI should allow failing over inactive nodes when allowUnsafe is true

Closed

relates to

MB-45462 CLI should allow failing over inactive nodes when allowUnsafe is true

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Sumedh Basarkod (Inactive)

Reporter:: Sumedh Basarkod (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 19/Mar/21 6:51 PM

Updated:: 17/Jun/21 2:49 PM

Resolved:: 01/Apr/21 4:56 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-45110: Allow unsafe failover of inactive nodes: Gerrit Review:

CBQE-6416: Cover MB-45110: Gerrit Review:

[Chronicle] Cluster can get potentially stuck such that we may not be able to remove failed nodes out of the cluster

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty