Uploaded image for project: 'Couchbase Java Client'
  1. Couchbase Java Client
  2. JCBC-1849

MCA Clusters configured with IP addresses may not trigger 2nd alert

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Do
    • Major
    • None
    • 2.7.20
    • Infrastructure
    • 1

    Description

      If a client is using the Node Health Failure Detector, and configured their MCA using IP addresses, the download of the cluster map after a cluster switch may put the detector into alert status before the Coordinator has closed the grace period. This could prevent the detector from re-alerting when nodes actually fail because it is already in the alert state.

       

      Sequence of event:

      1 - Nodes fail on Cluster 1, detector goes into alert state (RED).

      2 - Coordinator enters grace period.

      3 - Coordinator switches to Cluster 2, resets detector alert state to GREEN.

      4 - Cluster map received, adds nodes using DNS names, disconnects from IP addresses.

      5 - Detector picks up the disconnects, goes into alert state (RED).

      6 - Coordinator still in grace period, ignores alert, leaves detector in RED state.

      7 - When node does fail, Detector picks up but is already in RED state, so no change sent to Coordinator.

       

      Attached a sample from the SDK debug logs showing the sequence.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            daschl Michael Nitschinger
            davis.chapman Davis Chapman [X] (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty