Details
-
Improvement
-
Resolution: Fixed
-
Critical
-
4.6.0
-
4.5.1
Description
Usecase
For large datasets with 10s to 100s of TBs of data ( and billions of documents) the cluster sizing is large even with going as low as 5% resident ratio. Ideally, we should be able size as low as 1% residency or maybe even lower if user is fine with the performance tradeoffs.
Being able to reduce the cluster sizing while maintaining good performance requires addressing the following concerns.
Concerns & corresponding improvements
- Sizing: Currently for operability reasons going below 10% is not recommended. Fix the reasons behind requiring 10% memory residency.
- Disk read/write performance: This is currently slower than expected. Explore improvements to storage engine.
- Rebalance speed: Currently rebalance takes too long in high data density scenarios. Given that a single node can hold >10TB of data, rebalance needs to be faster.
- Rebalance stability: Rebalance can fail for many reasons. Reduce rebalance failures as well as restart rebalance automatically whenever feasible. Rebalance should work reliably from start to completion without requiring user intervention (even more critical when number of nodes is large).
- Global secondary indexes: Can’t index large dataset as memory requirement is high and indexes can’t span beyond a single index node. Need partitioned indexes that require low memory residency(ideally not % based, but if % based then as low as 5%) while maintaining acceptable performance.
- Backup/Restore: Backup and restore can take too long with large datasets. Also, the size of backups needs to be reduced.
Attachments
Issue Links
- depends on
-
MB-30053 Robust Rebalance: faster, lower impact and more autonomous
- Resolved
-
MB-33463 Magma integration
- Resolved
-
MB-10291 [OoO]: cbmcd connections cannot be efficiently used since operations are never interleaved
- Closed
-
MB-40152 Implement High Data Density Storage Engine - Magma
- Closed
-
MB-41599 GSI: Implement in-memory compression
- Closed
- relates to
-
MB-9197 Optimize rebalance data-movement for high data-density scenarios with DCP
- Closed