Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-28049

Make config sync more robust in the face of nodes that don't leave the cluster cleanly

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • backlog
    • 5.5.0
    • ns_server
    • None
    • Untriaged
    • Unknown

    Description

      Recently we saw a case where nodes left a cluster uncleanly, new nodes were renamed to have the same as the nodes that left uncleanly, and subsequently the config got corrupted. We believe it was a config exchange with one of the nodes that improperly left the cluster, though we can't confirm it as we don't have logs from all the nodes.

      We do have some protection against this kind of thing if the node that leaves actually receives the leave instruction, but if not and the nodes are renamed / bounced at an unfortunate time we can be vulnerable to the type of config corruption that was seen.

      This ticket is to track making the config exchange more robust even in the face poor timing of node renames / node power cycles when this kind of maneuver is performed.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            dfinlay Dave Finlay
            dfinlay Dave Finlay
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty