Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: 2.0
Affects Version/s: 1.6.5.3
Component/s: ns_server
Security Level: Public
Labels:
None
Environment:
Centos 5.5 64bit

Description

Cluster is configured with a "default" bucket of 8 nodes each with 8GB of ram configuration. Server has 16GB of ram.

The cluster was online for many many weeks and suddenly returned 0 cache hits yesterday. Upon login I see that the stats are functioning and that one node has been shutdown 209.151.227.98 due to ns_memcached002 code. However, the second this event happened, the entire cluster/bucket no longer returns cache hits or any data whatsoever.

What happend on 209.151.227.98 node. I can confirm that 209.151.227.98's membase server crashed and the processes with it are no longer active on the server. It appears the .98 suffered a hardware error and file system went into read-only mode.

sd 0:0:0:0: SCSI error: return code = 0x08000002
sda: <<DEFERRED>>: sense key: Hardware Error
Add. Sense: Mechanical positioning error

Info fld=0xb021bc
end_request: I/O error, dev sda, sector 4317391
Buffer I/O error on device sda1, logical block 27602
lost page write due to I/O error on sda1
Aborting journal on device sda1.
ext3_abort called.
EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only

However, this still does not explain how the bucket for the entire cluster is no longer returning cache hits.

Attached the log generated by the membase cli as well as a image capture of the GUI showing you the drop-off this single node failure caused the cluster to become unusable.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

85diag.tgz
6.97 MB
03/May/11 9:16 PM
86diag.tgz
9.86 MB
03/May/11 9:16 PM
87diag.tgz
9.87 MB
03/May/11 9:16 PM
88diag.tgz
9.85 MB
03/May/11 9:16 PM
95diag.tgz
9.87 MB
03/May/11 9:16 PM
96diag.tgz
9.83 MB
03/May/11 9:16 PM
97diag.tgz
9.87 MB
03/May/11 9:16 PM
membase_gui_capture.png
60 kB
19/Apr/11 6:44 PM
ns-diag-20110419180814.rar
7.52 MB
19/Apr/11 6:44 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Farshid Ghods (Inactive)

Reporter:: diego

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 19/Apr/11 6:44 PM

Updated:: 03/Oct/12 2:48 PM

Resolved:: 03/Oct/12 2:48 PM

Gerrit Reviews

There are no open Gerrit changes

Entire bucket of cluster goes offline (no cache hits) when one node is put offline with ns_memcached002 code

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty