Details

    Description

      (updated by alk: I cannot fix top posting but I took out some names out from this)

      Set up a node, fill the filesystem, watch processes run but see memcached take connections and just fail to respond.

      Also, set up a node, stop Couchbase. Fill the filesystem. Start Couchbase.

      On Wed, Mar 28, 2012 at 5:07 PM, Sharon Barr <XXXX> wrote:

      Unix is more mature then Couchbase at the edge cases. we are getting there.. or trying NOT to get there at all (another alternative..).

      From: Matt Ingenthron
      Sent: Wednesday, March 28, 2012 8:04 AM
      To: Frank Weigel; Perry Krug; Dipti Borkar
      Cc: Sharon Barr; Alex Ma; support-internal

      Subject: Re: YYYY having issues

      Incidentally, while testing the hotfix for AAAA with TMP_OOM, I accidentally ran my CentOS out of disk. The OS is running happily and so are our processes, but moxi is just returning errors and the memcached process isn't responding to stats requests.

      There is still free memory available, but happily we've (kinda) lived within our quota. Confusingly the quota is set to 512MByte, but the resident memory size of memcached is only 445MByte. The virtual size is larger, but it's likely not tried to allocate.

      So at least this UNIX-like OS is fine when out of disk.

      Matt

      On 3/27/12 9:55 PM, "Frank Weigel" <XXXXXX> wrote:

      In principal agree, but if this is the only disk, UNIX doesn't do well when entirely out of disk AFAIK, so we may need to do this when poor man's disk alert kicks in?

      That's a myth. Only buggy UNIXes (or UNIX-like OSs) don't do well there. I've worked with many a UNIX that is perfectly fine with a full disk.*

      I agree with Perry that it should end in TMP_OOM. We should leave ourselves some memory of course (since we need to receive the packet to respond with TMP_OOM), but there is no reason why this is not doable. It's simply a matter of writing and testing the software.

      Matt

      • the myth came from BSD that way, way, way back when required 2x the swap possible per process's memory to keep going. that "2x" is another myth that seems to keep perpetuating.

      From: Perry Krug <XXX>
      Date: Tue, 27 Mar 2012 02:27:01 -0700
      To: Frank Weigel <XXX>
      Cc: (skipped)
      Subject: Re: YYYYY having issues

      Can we please actually do something about this in the code so that the entire server doesn't just crash? We should start sending tmp_oom or something as soon as we detect that we are unable to write to disk.

      From: Sharon Barr <xxX>
      Date: Mon, 26 Mar 2012 17:11:58 -0700
      To: Alex Ma <XXX>, Perry Krug <xXX>
      Cc: skipped
      Subject: RE: YYYYY having issues

      Apparently they run out of disk space on all nodes..

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              pvarley Patrick Varley (Inactive)
              steve Steve Yen
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty