Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47920

Cost Effective Low-End Clusters improve resource utilization

    XMLWordPrintable

Details

    • Cost Effective Low-End Clusters
    • To Do
    • 1

    Description

      PRD : Cost Effective Low-End Clusters - Product Improvements

      Goal: Improve efficiency across services or apply limitations for low end clusters.

      As identified during investigating into “Low-End Clusters - Product Improvements” the overall resource utilization across clusters with smaller nodes (yes currently not recommended) such as CPU Memory can be improved.   Additionally were appropriate limits shall be identified that can be applied and enforced on low end clusters to ensure a good customer experience.

      A baseline for low end clusters has been implemented in show fast for idles system with and without data. http://showfast.sc.couchbase.com/#/timeline/Linux/cloud/lowend/idle this measures CPU performance across services against symmetric 3 node clusters (kv,index,n1ql,fts) of:

      • AWS t3.small (2 vCPU 2 GB Ram)
      • AWS t3.medium (2 vCPU 4 GB Ram)
      • AWS t3.large (2 vCPU 8 GB Ram)

      This Epic is primarily interest in t3.small, (2 vCPU 8 GB Ram) the other systems shall be considered available reference points.

      In this Epic we want to Lower CPU utilization across services for idle clusters

      The CPU utilization appears high this can lead to:

      1. Poor developers experience kicking the tires of Couchbase, performing tiny POCs or implementing trivial applications.
      2. Poor TCO (higher costs) in production where more expensive instances are used and an inability to effectively make use of AWS burstable instances or shared VMs.

      Each service can be improved as the total aggregated CPU utilization in idle (and also systems without any data) on low end systems can approach 1/3 of the of the available CPU in a low end cluster.

      • P0 accurate track total CPU                CBPS-949
      • P0 projector CPU utilization (%)         MB-47921
      • P0 indexer CPU utilization (%)            MB-TBD-R3
      • P0 prometheus CPU utilization (%)    MB-TBD-R4
      • P0 indexer needed RAM/bucket        MB-TBD-R5 and MB-46664
      • P0 limit number of buckets                MB-TBD-R6

      In this Epic we want to Lower Memory footprint across services for low end clusters

      In the baseline for low end clusters has been implemented in show fast for idles system with and without data. We need to ensure all services work reliably on low end clusters.

      In the research leading to the PRD "Developer Optimized Journey - Dedicated Lowend Tier in Idle" (Pages 2, 4, 13) it was observed that a 2 vCPU 2 GB RAM system could only support 6 buckets (each with just 50K small documents and a primary index).  On the 7th bucket an OOM was experienced by the indexer, refer to MB-TBD-R6

      http://showfast.sc.couchbase.com/#/timeline/Linux/cloud/lowend/idle  we run the following  services (kv,index,n1ql,fts) with the following settings:

      • data RAM quota: 256 MB
      • index RAM quota: 256 MB
      • FTS RAM quota: 256 MB

      The following was observed MB-46664 Hitting out of memory issue when loading docs with indexes created.

      In this Epic we want to enforce sane limits on services for low end clusters

      We need to apply limits to ensure all services work reliably, or apply a limits on low end clusters.

      Example should impose a limit of no more than 4 buckets for low end clusters with 2 vCPU and 2 GB RAM. MB-TBD-R6. Note, such a limit will mask/prevent the issue identified in MB-TBD-R5, i.e. OOM problem of 7+ buckets. 

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              jon.strabala Jon Strabala
              jon.strabala Jon Strabala
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty