Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 7.6.0
Affects Version/s: Cheshire-Cat
Component/s: ns_server
Labels:
- request-dev-verify
- technical-debt

Triage:
Triaged
Is this a Regression?:
Yes

Description

as described in comments here:
http://review.couchbase.org/c/ns_server/+/135827

I think this is still raceful.

If the supervisor managing root_sup is busy with something when dist_manager crashes during a rename, or if some of the ns_server_cluster_sup process take a long time to terminate, that may give ns_node_disco enough time to process the DOWN message and self-eject.

I don't quite know what to do about both rename related changes. I can see how they narrow the window for some races. But neither solve the problem in its entirety, so it's hard to say whether we end up at a better place overall.

To clarify a little bit. It's easier for me to convince myself that the previous change (ns_config checking for rename in init) is strictly improving the state of affairs. It's harder to come to the same conclusion about this change.

A quick (but not so clean) way to make the situation better would be for processes like ns_node_disco to check the termination reason of the renaming transaction. If it's 'normal', then assume everything went fine. Otherwise, terminate the process and let the logic in the init() function to deal with it. One problem with this though is that ns_node_disco might monitor the renaming process too late to get any reason but 'noproc'.

Potential solution might be monitoring dist_manager from ns_node_disco and terminating ns_node_disco immediately if dist_manager crashes

Attachments

Issue Links

depends on

MB-46868 Upgrade Erlang Version to 24 (Enablement for TLS v1.3 and avoiding node rename)

Closed

relates to

MB-45289 [System Test] engageCluster2 POST returns status 500 - Avoid node rename

Closed

MB-47605 error,wait_for_node_failed during addition of node

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Navdeep Boparai

Reporter:: Artem Stemkovski

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 17/Dec/20 11:09 AM

Updated:: 30/Jan/24 2:53 PM

Resolved:: 01/Mar/22 5:44 PM

Gerrit Reviews

There are no open Gerrit changes

address issue with dist_manager crashing during rename

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty