Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11321 Supportability : Alerting
  3. MB-4785

Meaningful alert when low-level packet corruption on node

    XMLWordPrintable

Details

    Description

      Logs showed that some low-level corruption in network data was apparent. Symptom is that nodes are going up and down. Not clear in the UI that this is happening only on 2 nodes. Not clear in UI that it's low-level corruption. Not clear that these nodes are consistently having a problem, and need to be failed over. No info bubbles up about why the node flaps up and down, or how to report this up to data center or Amazon (in this case on EC2).

      Need a clear alert to user, suggesting to fail over a troublesome node. Ideal to have concrete examples of the corrupt data to pass on to data center ops.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            djp Don Pinto [X] (Inactive)
            TimSmith Tim Smith (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty