Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 2.12.0
Component/s: library
Security Level: Public
Labels:
None

Description

Setup:
-multiple memcached servers
-ConnectionFactory.setFailureMode(FailureMode.Cancel) (or Retry)

Conditions:
-one of the memcached servers goes down, or is restarted

Observations:
-single key operations are immediately canceled (throws a CancellationException)
-multi-key operations (getBulk()/asyncGetbulk()) do not get cancelled. Instead they will timeout on the inactive node.

The cause seems to be the code in MemcachedClient.asyncGetBulk(): there is no check on the FailureMode value, only the node's active status. If a node is inactive, the code emulates the Redistribute failure mode (default failure mode).

The attached patch checks the ConnectionFactory's failure mode, and emulates the behavior of MemcachedConnection.addOperation:
-if the node is active or FailureMode is Retry, use the primary node
-if the node is inactive and FailureMode is Cancel, don't create an operation (no value will be returned for that key)
-otherwise, redistribute (existing default behavior)

This patch is not perfect:
-it relies on the ConnectionFactory failure mode, not the node's connection's FailureMode value (not visible); I'm pretty sure the values will be the same though.
it doesn't throw a CancellationException if the FailureMode is Cancel, and a node is inactive: instead it behaves like a "cache miss" instead. This is a compromise. The code could throw a CancellationException when a node is down, but it seems very inefficient if a single key -out of many is currently inaccessible.

This compromise is acceptable for us: we're looking for as little service impact as possible when one of our memcacehd servers goes down. The current behavior (timeout) causes a big pile-up and cascading timeouts.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

spymemcached.patch
0.9 kB
24/Jul/15 2:23 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Michael Nitschinger

Reporter:: David Chatenay

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 24/Jul/15 2:04 PM

Updated:: 24/Jul/15 2:23 PM

Gerrit Reviews

There are no open Gerrit changes

MemcacheClient.getBulk() doesn't use FailureMode

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty