20-30% performance degradation on append-heavy workload compared to 2.5.1
Description
Components
Affects versions
Fix versions
Labels
Environment
Link to Log File, atop/blg, CBCollectInfo, Core dump
Release Notes Description
Attachments
relates to
Activity
Chiyoung Seo August 8, 2014 at 6:00 PM
@Dave,
That's a release note-related ticket. Please feel free to create it and assign it to the documentation team.
Chiyoung Seo August 8, 2014 at 5:57 PM
I think we (engine and perf team) gave the reasonable explanation and future plan for further optimizations. A customer (user) can adjust those runtime configurable parameters for their dev / staging tests before deploying them into their production env. We will continue to investigate how we can optimize the resource allocations to meet conflict QoS metrics in a maximum way.
I don't want to have this ticket reopened. We will create a new ticket for the above improvement tasks.
Dave Rigby August 8, 2014 at 5:52 PM
@Sundar, @Pavel,
Agreed, and I'm not trying to suggest you're dodging the issue, nor that it's a trivial problem to solve.
My point is more that assuming revAB is a valid, representative customer workload, it has regressed compared to 2.5.1. Users coming to 3.0 cold may by quite happy with the disk queue sizes and performance as they see it, but upgraders may have questions about the change in performance, and I think if we can't have a "fix" in place for 3.0 we should at least ensure users are educated about these changes - including the relative merits of disk queue length v.s. front-end performance, etc.
In terms of what this means in concrete terms, we should ensure that there's information in the 3.0 documents describing the "optimised defaults for high-end machines" - which can emphasse the benefits, but should also mention the changes people may see v.s. 2.5.1.
I would like to re-open this ticket (or at least have a new one spun off) to ensure such information is present - any objections?
Pavel Paulau August 8, 2014 at 5:45 PM
It's not that simple. Accurate auto-tuning requires at least disk IO characteristics and analysis of workload.
We barely can define optimal configuration based on number of vCPU or number of buckets.
Sundar Sridharan August 8, 2014 at 5:41 PM
Dave, we don't auto-tune currently because it is hard to do so, or rather, NP-hard to do so, to be precise. We have a number of variables which are cannot be maximized simultaneously. For example currently you are observing performance degradation by measuring front-end ops, which is an important parameter to consider, however it does not consider the disk drain rate, which is an important parameter for high availability. Then there are other parameters too for example, CPU utilization. A setting that works for a given use case may degrade another. Currently we have chosen to optimize for high end machines because we believe this would benefit the majority of our production customers. Clearly it isn't helping an use case such as revAB out of the box. That's why we tell you the knobs that can be turned to help.
Please do not consider this as a means of dodging the issue, we definitely plan to have a heuristic sampling algorithm in place for auto-tuning the system in the future, but for now, we would like to go the manual tuning way like Pavel mentions. thanks
When running an append-heavy workload (modelling a social network address book, see below) the performance of CB has dropped from ~100K ops down to 50K ops compared to 2.5.1-1083 on OS X.
Edit: I see a similar (but slightly smaller - around 40% degradation on Linux (Ubuntu 14.04) - see comment below for details.
== Workload ==
revAB_sim - generates a model social network, then builds a representation of this in Couchbase. Keys are a set of phone numbers, values are lists of phone books which contain that phone number. (See attachment).
Configured for 8 client threads, 100,000 people (documents).
To run:
pip install networkx
Check revAB_sim.py for correct host, port, etc
time ./revAB_sim.py
== Cluster ==
1 node, default bucket set to 1024MB quota.
== Runtimes for workload to complete ==
.5.1-1083:
~107K op/s. Timings for workload (3 samples):
real 2m28.536s
real 2m28.820s
real 2m31.586s
.0.0-918
~54K op/s. Timings for workload:
real 5m23.728s
real 5m22.129s
real 5m24.947s