Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Unresolved
Priority: Major
Fix Version/s: techdebt-backlog
Affects Version/s: 2.2.0
Component/s: ns_server
Security Level: Public
Labels:
- fast_failover

Description

In the field, we are seeing many time that when a node is 'slow' due to the OS, the node is auto-failed over. During this 'slow' time the memcached process is handling gets/sets from the clients without any issues.

Often the issue comes down to erlang not being able to communicate to each other for some reason that is not impacting memcached and is sometimes blamed on swap, THP, erlang's internal balancing among threads, etc.

Should we look at moving the auto-failover logic out of erlang to help prevent some of these 'false' failovers?

Attachments

Issue Links

relates to

MB-9321 Implement new cluster orchestration (was: Get us off erlang's global facility and re-elect failed master quickly and safely

Resolved

Activity

People

Assignee:: Aliaksey Artamonau (Inactive)

Reporter:: James Mauss (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 21 Start watching this issue

Dates

Created:: 26/Nov/13 10:20 AM

Updated:: 20/Feb/17 2:21 PM

Discussion - Should we move auto-failover out of erlang?