Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
1.8.1, 2.0, 2.0.1, 2.1.1, 2.2.0, 2.5.1, 5.0.1
-
Security Level: Public
-
02/Sep/2013 - 20/Sep/2013
Description
we had two nodes crash at a customer, possibly related to a disk space issue, but I don't think so.
After they crashed, the nodes warmed up relatively quickly, but immediately "discarded" their items. I say that because I see that they warmed up ~10m items, but the current item counts were both 0.
I tried shutting down the service and had to kill memcached manually (kill -9). Restarting it went through the same process of warming up and then nothing.
While I was looking around, I left it sit for a little while and magically all of the items came back. I seem to recall this bug previously where a node wouldn't be told to be active until all the nodes in the cluster were active...and it got into trouble when not all of the nodes restarted.
Diags for all nodes will be attached