Details
-
Improvement
-
Resolution: Fixed
-
Major
-
7.2.0, 7.1.3
Description
Currently plasma is using the following scheme for co-locating indexes on shards for on-prem model:
- index defined on _default scope and collection - dedicated shard for each index instance
- index defined on named scope/collection - multiple index instances share a single plasma shard
Sharing the shard among multiple index instances reduces the overhead as each shard allocates a flush buffer of 2.25MB.
Consider the following single index node:
100 indexes _default scope+collection = 200 shards (back+main index) = 450MB of memory allocation for flush buffer
vs.
100 indexes in name scope/collection = 20 shards (back+main index) = 45MB of memory allocation for flush buffer
The actual logic to put multiple instances in a shard is controlled by the following flags
"indexer.plasma.maxInstancePerShard"
"indexer.plasma.maxDiskUsagePerShard"
"indexer.plasma.minNumShard"
But the general idea is that co-locating instances in a shard reduces the flush buffer overhead.
As users can create large number of indexes on bucket(_default scope/collection) instead of using named collections, this improvement can help reduce the overheads on low configuration nodes.
Attachments
Issue Links
For Gerrit Dashboard: MB-57629 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
193211,6 | MB-57629: [BP] Use config useSharedLSS for sharing plasma shard LSS for bucket indexes | neo | indexing | Status: MERGED | +2 | +1 |
193812,2 | MB-57629 [BP]: Infer shared flag from shard metadata | neo | plasma | Status: MERGED | +2 | +1 |