(updated by alk: I cannot fix top posting but I took out some names out from this)
Set up a node, fill the filesystem, watch processes run but see memcached take connections and just fail to respond.
Also, set up a node, stop Couchbase. Fill the filesystem. Start Couchbase.
On Wed, Mar 28, 2012 at 5:07 PM, Sharon Barr <XXXX> wrote:
Unix is more mature then Couchbase at the edge cases. we are getting there.. or trying NOT to get there at all (another alternative..).
From: Matt Ingenthron
Sent: Wednesday, March 28, 2012 8:04 AM
To: Frank Weigel; Perry Krug; Dipti Borkar
Cc: Sharon Barr; Alex Ma; support-internal
Subject: Re: YYYY having issues
Incidentally, while testing the hotfix for AAAA with TMP_OOM, I accidentally ran my CentOS out of disk. The OS is running happily and so are our processes, but moxi is just returning errors and the memcached process isn't responding to stats requests.
There is still free memory available, but happily we've (kinda) lived within our quota. Confusingly the quota is set to 512MByte, but the resident memory size of memcached is only 445MByte. The virtual size is larger, but it's likely not tried to allocate.
So at least this UNIX-like OS is fine when out of disk.
On 3/27/12 9:55 PM, "Frank Weigel" <XXXXXX> wrote:
In principal agree, but if this is the only disk, UNIX doesn't do well when entirely out of disk AFAIK, so we may need to do this when poor man's disk alert kicks in?
That's a myth. Only buggy UNIXes (or UNIX-like OSs) don't do well there. I've worked with many a UNIX that is perfectly fine with a full disk.*
I agree with Perry that it should end in TMP_OOM. We should leave ourselves some memory of course (since we need to receive the packet to respond with TMP_OOM), but there is no reason why this is not doable. It's simply a matter of writing and testing the software.
- the myth came from BSD that way, way, way back when required 2x the swap possible per process's memory to keep going. that "2x" is another myth that seems to keep perpetuating.
From: Perry Krug <XXX>
Date: Tue, 27 Mar 2012 02:27:01 -0700
To: Frank Weigel <XXX>
Subject: Re: YYYYY having issues
Can we please actually do something about this in the code so that the entire server doesn't just crash? We should start sending tmp_oom or something as soon as we detect that we are unable to write to disk.
From: Sharon Barr <xxX>
Date: Mon, 26 Mar 2012 17:11:58 -0700
To: Alex Ma <XXX>, Perry Krug <xXX>
Subject: RE: YYYYY having issues
Apparently they run out of disk space on all nodes..