Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11614

Discussion - Should we move auto-failover out of erlang?

    XMLWordPrintable

Details

    Description

      In the field, we are seeing many time that when a node is 'slow' due to the OS, the node is auto-failed over. During this 'slow' time the memcached process is handling gets/sets from the clients without any issues.

      Often the issue comes down to erlang not being able to communicate to each other for some reason that is not impacting memcached and is sometimes blamed on swap, THP, erlang's internal balancing among threads, etc.

      Should we look at moving the auto-failover logic out of erlang to help prevent some of these 'false' failovers?

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Aliaksey Artamonau Aliaksey Artamonau (Inactive)
              james.mauss James Mauss (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty