Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2017

Publish autoscaling best practices and recommended metrics

    XMLWordPrintable

Details

    • Page
    • Resolution: Done
    • Critical
    • 2.2.0
    • None
    • documentation
    • None

    Description

      This ticket tracks documentation specifically related to the QE-tested metrics and best practices for autoscaling (define set of metrics we recommend to scale on). The main documentation for autoscaling (e.g. how-tos, concepts, reference) is handled in K8S-2036.

      Documentation Plan

      Introduction. Guidelines and Best Practices

      • modify - document any exceptions to existing best practices that don’t apply when auto-scaling is enabled (e.g. server groups, anti-affinity, etc)

      (NEW PAGE) Learn. Couchbase Cluster Concepts. Auto-scaling Best Practices

      • New page - will include individual sections covering each service and the tested scaling metrics.
        • Sections:
        • Introduction
        • Data Service
        • Index Service
        • Query Service

       -- 

      Tommie is currently producing a table of data that includes the thresholds/settings for each test scenario, along with the test results for a selective number of metrics. Tommie also presented a number of graphs showing the raw test results.

      There seemed to be a consensus that once Tommie finalizes the test results for the Data Service, he should provide the following:

      1. The finished table of test scenarios and selected results
      2. A final set of graphs, each having annotated labels along the X-axis describing what/when relevant events occurred in the cluster (e.g. workload generated, rebalance start/stop, compaction start/stop, HPA window start/end, etc.)
      3. An opinionated statement describing the best practices that can be drawn from the test scenarios and graphs, along with any relevant caveats or suggestions that a customer might use to extrapolate the results for their own cluster configurations and workloads. For example: “A larger average document size than those tested may cause longer rebalance times, which may require reducing the scaling threshold for X metric.”

      With the above, my hope is that we can create a best practices guide that presents a curated approach to the data – one where we try to only show the necessary graphs and data points to effectively justify our recommendations, rather than presenting a report full of raw data analysis.

      Tommie also noted that some of the test settings he is using are best estimates, but potentially aren’t reflective of real-world customer scenarios. For example, the tests were assuming something like a 30% write rate. Tommie noted that it would be good to get early feedback from a wide audience to try an illicit opinions on whether the test scenarios and settings we are using accurately reflect what we’ve observed in customer environments. This might be a good incentive for us to quickly finish the best practices guide for the Data Service so that we can start passing it around internally within the company to get early feedback on both the data and the design of the guide.

      Draft Documentation

      Learn. Couchbase Cluster Concepts. Couchbase Cluster Auto-scaling. Auto-scaling Best Practices

      • New page documenting best practices and recommendations for Couchbase cluster auto-scaling

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              roo.thorp Roo Thorp
              ingenthr Matt Ingenthron
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty