Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: Cheshire-Cat
Component/s: documentation, fts
Labels:
None

Story Points:
1

Description

Except for the FTS, all other services have a section for sizing here - https://docs.couchbase.com/server/current/install/sizing-general.html#sizing-index-service-nodes

Need to introduce similar sizing guidelines here for FTS.

Reference Contents =>

The nodes running the Search Service must be sized properly to create and maintain Search indexes to operate full-text search queries in a performant manner.

Sizing the Search service is an iterative process due to the diverse nature of information related to the text tokens stored in the inverted index and from the variety of the storage structures hosting that information faster for retrieval during the query phase.

The inverted index in its simplest form consists of a list of all the unique terms that appear in any document, and for each term, a list of the documents in which it appears. The inverted index provides a mapping from terms to documents. With enhanced query features of inverted indexes, they also come with a tradeoff of larger index sizes. It’s important to note that inverted indexes can grow up to several times larger than the original data set being indexed. The type mappings of the index definition including any of the field option properties selected would decide the eventual index size.

For example, enabling the phrase query requirements of `term_vectors` field options, Or highlighting requirements of `store` field options Or sort by fields requirements of `doc_value` field options Or enabling `_all` field option, etc will all end up adding a lot more details into the index and thereby to the eventual index size as well.

A quick functional approach for sizing an FTS cluster can be achieved in two phases.

Phase 1 - Volume Based Initial Cluster Sizing

Phase 2 - Performance(indexing/querying) Based Cluster Size Tuning

Phase 1 - Volume Based

Estimating the index size from the customer use case details and thereby deriving the RAM, node, and partition counts from the proposed index size.

It involves the following steps.

Need to gather details on the below sizing aspects from the customer.

Use Case Description (to gain insights into index or query heavy cases?)
Couchbase Version (matters for the index type and other optimizations)
Amount of data to index.
Index definition (if built already)
The average size of the document.
Will you Index all data in a bucket or subset of the fields
Are there special analyzers used?
Rate of change of the data size.
Longevity of the data/documents. (expiry set?)
High availability requirements (eg: replica count)
Latency SLA requirements
Throughput SLA requirements
Which API will you be using to execute FTS queries?
Type of queries.
1. Simple queries
2. Complex queries (conjuncts, disjuncts, sorts, fuzzy, facets etc)
3. Pagination (offset/limit)
How many indexes do you intend to create?

Once the user has clarity on these details, then reaching out to the Solution Engineering team would help them get through the initial sizing estimates for the FTS service using an internal sizing calculator.

The user can then work towards estimating the number of index partitions, amount of RAM, FTS RAM quota, number of nodes, number of CPU cores per node, etc from the estimated index size.

Few useful generic/overridable guidelines while figuring out those details from the overall index size are,

FTS RAM quota ideally should be set to 75-80% of available RAM in a node. This helps to give some leeway to the OS for managing the filesystem cache.
It is better to keep the amount of data under each partition under a limit like 300GB or so. At that point, it makes sense to add more partitions to parallelize the search for speeding up the query.

Provision enough RAM to give a healthy resident ratio for the resulting index. Though FTS doesn't have any operational resident ratio requirements, it is safer to provision sufficient RAM for a better resident ratio depending on the budgeting constraints.

Replica partitions are not used for serving live query traffic, but it consumes CPU/Memory resources for indexing.
Spare a core per partition for peak performance.

Phase 2 - Performance (Indexing rate/ Query throughput/latency) Based Cluster Size Tuning

After the cluster is up and running with the initial volume-based recommendations, we could use the indexing rate requirements and/or search performance throughput/latency requirements gaps to scale up/down the CPU and RAM memory requirements further.

There are too many factors affecting the indexing/search throughout and response times to predict how any given configuration will eventually perform.

Hence empirically testing on a smaller scale actual data set and scaling it gradually towards a realistic production scale traffic seems an imperative step in sizing any cluster.

During these scaling iteration, users may need to work in the following area.

Adjust the partition count to help with better query or indexing throughputs/latency or better hardware utilization.
Adjust the RAM/RAM Quota/Number of Nodes/CPU cores etc.
Revisit the index definition and query for better results.

Please reach out to the product forums or solution engineering for any sizing-related queries on the Search service.

Attachments

Issue Links

duplicates

DOC-9267 Create FTS node sizing examples

Reopened

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Simon Dew

Reporter:: Sreekanth Sivasankaran (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 26/Oct/21 3:16 AM

Updated:: 27/Oct/21 1:25 AM

Resolved:: 27/Oct/21 1:25 AM

Gerrit Reviews

There are no open Gerrit changes

FTS - Introduce Sizing section for Search service

Details

Description

Phase 1 - Volume Based Initial Cluster Sizing

Phase 2 - Performance(indexing/querying) Based Cluster Size Tuning

Phase 1 - Volume Based

Phase 2 - Performance (Indexing rate/ Query throughput/latency) Based Cluster Size Tuning

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty