Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7182

[RN 2.0.1]ns_server experiences random timeouts supposedly due to lack of async io threads causing rebalance to fail and other potential badness

    XMLWordPrintable

Details

    • Release Note

    Description

      SUBJ.

      In many diags we were seeing we're seeing occasional timeouts here and there. Sometimes and perhaps most of the time they don't affect correct operation of product. After all erlang is famous for it's fault resiliency.

      But sometimes it causes rebalance to fail. I.e. see MB-7166 where mb_master which supervised ns_orchestrator which supervised rebalance died due to timeout. Which according to normal error handling behavior of Erlang caused it's restart. But part of restart was shutting down of child processes, including obviously rebalancer.

      In my personal experience this is quite easy to hit on physical hardware and spinning disks. But apparently we're now getting in on Xen and SSDs as well as potentially (MB-7152) on physical hardware and SSDs.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            kzeller kzeller
            alkondratenko Aleksey Kondratenko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty