Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Untriaged
-
Unknown
Description
When a node is ejected out of the cluster, the following set of operations are performed by the ns_cluster process on the node leaving the cluster:
- Creates a “leave marker” to indicate the beginning on the leave procedure.
- Empties the users storage.
- Stops most of the ns_server processes.
- Wipes out the stats archive.
- Disconnects from the other nodes in the cluster by regenerating a new Erlang cookie.
- Resets the address of the node back to 127.0.0.1
- Clears the ns_config copy and reinitialize with default values.
- Updates the nodes_wanted list to self.
- Creates a “start processes marker” to indicate that the ns_server processes were being restarted (the ones stopped earlier).
- Deletes the “leave marker”.
- Restarts all the stopped processes.
- Restarts memcached.
- Deletes the “start processes marker” to signify the end of the leave procedure.
If for some reason ns_cluster crashes before completing step 10, then when ns_cluster is restarted it attempts perform the leave again as the marker has not been cleared. As part of the leave when ns_cluster tries to perform step 2 it crashes with noproc exception as the downstream process responsible for clearing the user storage wouldn't have started yet.