Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Fixed
Priority: Major
Fix Version/s: 7.0.0
Affects Version/s: None
Component/s: ns_server, storage-engine
Labels:

Epic Link:
Magma DP
Sprint:
Magma: Jan 20 - Feb 2

Description

Magma creates large number of files for minimizing the write amplification as well as avoiding large compactions. For a very large dataset like 20TB, we may end up creating 500k - 5M files (There is going to be a tunable option to reduce number of files at write amplification cost by doing file reduction compaction).

For a 1% resident dataset, 99% of the data is inactive and the number of files doesn't add significant overhead.

The godu process spawns by ns_server to estimate disk usage forces all the dentries and file core inodes to be present in memory. Since the godu runs every second (or similar frequent interval), OS is forced to keep file metadata in memory. This creates significant pressure of virtual memory and the system runs into swap if a large amount of free memory is not reserved for the OS outside of bucket quota. A lot of the files are already deleted and dentry cache is left with inactive file data as well.

For magma, we are moving to direct i/o and trying to avoid scalability issues related to filesystem / page cache.

The following slabtop output shows the severity of memory consumed by the filesystem metadata:

It consumes 5102368K(dentry) + 8501184K(inode) + 797752K(xfsmeta).

 Active / Total Objects (% used)    : 41794189 / 60179910 (69.4%)

 Active / Total Slabs (% used)      : 1378377 / 1378377 (100.0%)

 Active / Total Caches (% used)     : 68 / 100 (68.0%)

 Active / Total Size (% used)       : 10199466.90K / 13702587.58K (74.4%)

 Minimum / Average / Maximum Object : 0.01K / 0.23K / 16.69K

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME

26787432 9544323  35%    0.19K 637796       42   5102368K dentry

13761856 13759786  99%    0.06K 215029       64    860116K kmalloc-64

5630655 5630276  99%    0.08K 110405       51    441620K Acpi-State

5610222 5609944  99%    1.06K 265662       30   8501184K xfs_inode

5285107 5284903  99%    0.15K  99719       53    797752K xfs_ili

374912 289450  77%    0.03K   2929      128     11716K kmalloc-32

373248 211264  56%    0.02K   1458      256      5832K kmalloc-16

318045 106955  33%    0.10K   8155       39     32620K buffer_head

283458 145845  51%    0.38K   6749       42    107984K mnt_cache

266000  39069  14%    0.50K   4165       64    133280K kmalloc-512

250560  77845  31%    0.25K   3915       64     62640K kmalloc-256

205800 205577  99%    0.07K   3675       56     14700K Acpi-Operand

145887 142911  97%    0.57K   3656       56    116992K radix_tree_node

145350 145350 100%    0.02K    855      170      3420K scsi_data_buffer

114176 114176 100%    0.01K    223      512       892K kmalloc-8

110292  32043  29%    0.09K   2626       42     10504K kmalloc-96

Magma will report accurate disk usage of aggregate files consumed by magma per vbucket through kv-engine. Can we depend on these stats and avoid polling large number of file inodes from disk ?

Attachments

Issue Links

relates to

MB-38012 1% DGM Test: Write ops/s dropped from 230k/s to 400/s due to swapping

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Sarath Lakshman

Reporter:: Sarath Lakshman

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 26/Feb/20 9:15 AM

Updated:: 17/Jun/21 2:40 PM

Resolved:: 20/May/20 2:02 AM

Gerrit Reviews

There are no open Gerrit changes

Disk usage estimation with godu does not scale with large number of files

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty