Details
-
Page
-
Resolution: Done
-
Critical
-
None
-
None
Description
This ticket tracks documentation specifically related to the QE-tested metrics and best practices for autoscaling (define set of metrics we recommend to scale on). The main documentation for autoscaling (e.g. how-tos, concepts, reference) is handled in K8S-2036.
Documentation Plan
Introduction. Guidelines and Best Practices
modify - document any exceptions to existing best practices that don’t apply when auto-scaling is enabled (e.g. server groups, anti-affinity, etc)
(NEW PAGE) Learn. Couchbase Cluster Concepts. Auto-scaling Best Practices
New page - will include individual sections covering each service and the tested scaling metrics.Sections:IntroductionData ServiceIndex ServiceQuery Service
--
Tommie is currently producing a table of data that includes the thresholds/settings for each test scenario, along with the test results for a selective number of metrics. Tommie also presented a number of graphs showing the raw test results.
There seemed to be a consensus that once Tommie finalizes the test results for the Data Service, he should provide the following:
The finished table of test scenarios and selected resultsA final set of graphs, each having annotated labels along the X-axis describing what/when relevant events occurred in the cluster (e.g. workload generated, rebalance start/stop, compaction start/stop, HPA window start/end, etc.)An opinionated statement describing the best practices that can be drawn from the test scenarios and graphs, along with any relevant caveats or suggestions that a customer might use to extrapolate the results for their own cluster configurations and workloads. For example: “A larger average document size than those tested may cause longer rebalance times, which may require reducing the scaling threshold for X metric.”
With the above, my hope is that we can create a best practices guide that presents a curated approach to the data – one where we try to only show the necessary graphs and data points to effectively justify our recommendations, rather than presenting a report full of raw data analysis.
Tommie also noted that some of the test settings he is using are best estimates, but potentially aren’t reflective of real-world customer scenarios. For example, the tests were assuming something like a 30% write rate. Tommie noted that it would be good to get early feedback from a wide audience to try an illicit opinions on whether the test scenarios and settings we are using accurately reflect what we’ve observed in customer environments. This might be a good incentive for us to quickly finish the best practices guide for the Data Service so that we can start passing it around internally within the company to get early feedback on both the data and the design of the guide.
Draft Documentation
Learn. Couchbase Cluster Concepts. Couchbase Cluster Auto-scaling. Auto-scaling Best Practices
- New page documenting best practices and recommendations for Couchbase cluster auto-scaling
Attachments
Issue Links
- blocks
-
K8S-2043 Autoscaling Tutorial: Memory based Autoscaling Data Service (ephemeral buckets)
- Closed
-
K8S-2044 Autoscaling Tutorial: Memory/Network Based Auto-Scaling of Data Service
- Closed
-
K8S-2045 Autoscaling Tutorial: Memory Based Auto-Scaling of Index Service
- Closed
- has to be started together with
-
K8S-1952 Autoscaling requirement: CAO Prometheus docs should have xrefs to any available stats documentation for each Couchbase Service
- Resolved
- relates to
-
K8S-2144 Autoscaling testing
- Closed
-
K8S-1851 Autoscaling Scope and Prioritized Component Metrics
- Resolved
-
K8S-2036 Documentation for K8S-1906: Cluster Autoscaling - Stateful Services (Autoscaling GA)
- Closed
- mentioned in
-
Page Loading...