Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • 0.3
    • None
    • cluster-monitor
    • None

    Description

      Currently the Status loop (which runs the health checks) runs every five minutes, which means that an issue might go unnoticed for up to five minutes, which could lead to inconsistent data in the dashboards and poor UX.

      Ideas for how we could improve this:

      • Just run the checkers more frequently - I'd rather not, since they could quickly overload clusters
      • Split the checkers into "frequent" and "less frequent" groups that run at different intervals
      • Re-run some checkers (those that only need "cluster summary" data and nothing else) as soon as the cluster summaries are updated (which is done by the Heart loop every minute)
        • Related to that, possibly use streaming / long-polling for updating that data near-instantly

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              shaashwat.jain Shaashwat Jain
              marks.polakovs Marks Polakovs (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty