Details
-
Improvement
-
Resolution: Fixed
-
Major
-
7.2.0, 7.1.3
Description
Currently plasma is using the following scheme for co-locating indexes on shards for on-prem model:
- index defined on _default scope and collection - dedicated shard for each index instance
- index defined on named scope/collection - multiple index instances share a single plasma shard
Sharing the shard among multiple index instances reduces the overhead as each shard allocates a flush buffer of 2.25MB.
Consider the following single index node:
100 indexes _default scope+collection = 200 shards (back+main index) = 450MB of memory allocation for flush buffer
vs.
100 indexes in name scope/collection = 20 shards (back+main index) = 45MB of memory allocation for flush buffer
The actual logic to put multiple instances in a shard is controlled by the following flags
"indexer.plasma.maxInstancePerShard"
"indexer.plasma.maxDiskUsagePerShard"
"indexer.plasma.minNumShard"
But the general idea is that co-locating instances in a shard reduces the flush buffer overhead.
As users can create large number of indexes on bucket(_default scope/collection) instead of using named collections, this improvement can help reduce the overheads on low configuration nodes.