• New Feature
    • Status: To Do
    • Major
    • Resolution: Unresolved
    • None
    • 0.3
    • cluster-monitor, cmos
    • None


      Filing this moreso we don't forget about it, rather than a specific implementation plan or spec; will likely need refining.

      It's not hard to imagine a situation where more than one checker would go off at the same time with the same root cause. For example, taking down a node is the classic case: you'd get pings for the node being down, the cluster not being fully active, as well as potentially missing active/replica vBuckets - three alerts for the same root cause.

      This could happen in more subtle cases as well - for example, the issue described in CMOS-377 could manifest itself both as an entry in the memcached log as well as a detectable condition by querying Analytics. We'd ideally suppress the former, as the latter would be more specific.

      Alertmanager has a system like this: We should look into whether it'll be useful for us, or take hints from it if it isn't.


        No reviews matched the request. Check your Options in the drop-down menu of this sections header.


          There are no comments yet on this issue.


            Unassigned Unassigned
            marks.polakovs Marks Polakovs (Inactive)
            0 Vote for this issue
            1 Start watching this issue



              Gerrit Reviews

                There are no open Gerrit changes