Details
Description
Steps.
1. Node A has been disconnected from the cluster.
2. An ns_config key K is deleted.
3. This change does not get replicated to node A.
4. Node A reconnects to the cluster and replicates its value for K to other nodes in the cluster.
5. Right at this moment tombstone purging is initiated.
6. The orchestrator node collects all tombstones and replicates them to all nodes.
7. This makes node A aware that K got deleted. But the replicated change from step (4) is still in-flight.
8. K gets purged.
9. The replicated change makes it to its destination, which resurrects the key.
This all is probably highly unlikely, but still theoretically possible.
With the tools that ns_config affords, the only way I could see to fix this is to add an extra step to purging where after all tombstones are replicated to all nodes we'd also make sure that all nodes have drained any in-flight changes. This requires each node to synchronize with each other node (O(n^2) synchronizations).