Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-3651

Entire bucket of cluster goes offline (no cache hits) when one node is put offline with ns_memcached002 code

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 2.0
    • 1.6.5.3
    • ns_server
    • Security Level: Public
    • None
    • Centos 5.5 64bit

    Description

      Cluster is configured with a "default" bucket of 8 nodes each with 8GB of ram configuration. Server has 16GB of ram.

      The cluster was online for many many weeks and suddenly returned 0 cache hits yesterday. Upon login I see that the stats are functioning and that one node has been shutdown 209.151.227.98 due to ns_memcached002 code. However, the second this event happened, the entire cluster/bucket no longer returns cache hits or any data whatsoever.

      What happend on 209.151.227.98 node. I can confirm that 209.151.227.98's membase server crashed and the processes with it are no longer active on the server. It appears the .98 suffered a hardware error and file system went into read-only mode.

      sd 0:0:0:0: SCSI error: return code = 0x08000002
      sda: <<DEFERRED>>: sense key: Hardware Error
      Add. Sense: Mechanical positioning error

      Info fld=0xb021bc
      end_request: I/O error, dev sda, sector 4317391
      Buffer I/O error on device sda1, logical block 27602
      lost page write due to I/O error on sda1
      Aborting journal on device sda1.
      ext3_abort called.
      EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
      Remounting filesystem read-only

      However, this still does not explain how the bucket for the entire cluster is no longer returning cache hits.

      Attached the log generated by the membase cli as well as a image capture of the GUI showing you the drop-off this single node failure caused the cluster to become unusable.

      Attachments

        1. 85diag.tgz
          6.97 MB
        2. 86diag.tgz
          9.86 MB
        3. 87diag.tgz
          9.87 MB
        4. 88diag.tgz
          9.85 MB
        5. 95diag.tgz
          9.87 MB
        6. 96diag.tgz
          9.83 MB
        7. 97diag.tgz
          9.87 MB
        8. membase_gui_capture.png
          membase_gui_capture.png
          60 kB
        9. ns-diag-20110419180814.rar
          7.52 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            farshid Farshid Ghods (Inactive)
            diego diego
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty