Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Incomplete
Priority: Major
Fix Version/s: 1.0.3
Affects Version/s: 1.0.3
Component/s: Core
Security Level: Public
Labels:
None

Description

Incident #2373

Please see issue we encountered with our latest version below.

This passed QA on stage but when going on to a live environment with full production load we see the behavior below. At first this occurred on a cluster we had just rebalanced. During that rebalance we saw rise in couchbase clients apps' CPU, which did not decrease after the rebalance was (successfully) done, until we restarted said clients. We suspected that the problem below is directly related to the issues we saw during rebalance so we also tested it on a different cluster that did not go any such rebalance. Results were the same. After searching all over to see what changed we realized that one change during this version was that we upgraded from Couchbase Client 1.0.1 and spy 2.8.0 to 1.0.3 and 2.8.2. We then took that exact build swapping the 1.0.3 with the 1.0.1 jars and everything started behaving fine.

The reason we MUST have 1.0.3 on production is the following from 1.0.3's release notes (http://www.couchbase.com/docs/couchbase-sdk-java-1.0/couchbase-sdk-java-rn_1-0-3.html):

It was found that in the dependent spymemcached client library that errors encountered in optimized set operations would not be handled correctly and thus application code would receive unexpected errors during a rebalance. This has been worked around in this release by disabling optimization. This may have a negilgable drop in throughput but shorter latencies.

We believe the issues mentioned above on the clients during the rebalance are exactly this.

1. Any ideas on reason for this?
2. How would you advise to proceed.

Cheers,
Ira

1. One server is putting data to a memcached bucket. TTL is about 30 minutes.
2. Another server tries to get this data but randomly fails (at about of 50% miss rate). We are getting nulls instead of real values. We are using asyncGet and then Future.get() with timeout of 5 seconds. We did not observe that timeout was reached.
Time period between (1) and (2) is less than a minute. We debugged (1) and saw that it is being written without errors.
No exceptions or errors.
Data cluster wasn't heavy loaded, other clients (1.0.1) were working at the same time with this bucket and operated properly.

Sergey

From: Ira Holtzer

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Saran Kumar (Inactive)

Reporter:: Saran Kumar (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 21/Dec/12 2:11 AM

Updated:: 21/Dec/12 1:41 PM

Resolved:: 21/Dec/12 1:41 PM

Gerrit Reviews

There are no open Gerrit changes

High Couchbase clients apps' CPU after upgraded from Couchbase Client 1.0.1 and spy 2.8.0 to 1.0.3 and 2.8.2.

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty