Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-38079

Disk usage estimation with godu does not scale with large number of files

    XMLWordPrintable

Details

    • Magma: Jan 20 - Feb 2

    Description

      Magma creates large number of files for minimizing the write amplification as well as avoiding large compactions. For a very large dataset like 20TB, we may end up creating 500k - 5M files (There is going to be a tunable option to reduce number of files at write amplification cost by doing file reduction compaction).

      For a 1% resident dataset, 99% of the data is inactive and the number of files doesn't add significant overhead.

      The godu process spawns by ns_server to estimate disk usage forces all the dentries and file core inodes to be present in memory. Since the godu runs every second (or similar frequent interval), OS is forced to keep file metadata in memory. This creates significant pressure of virtual memory and the system runs into swap if a large amount of free memory is not reserved for the OS outside of bucket quota. A lot of the files are already deleted and dentry cache is left with inactive file data as well.

      For magma, we are moving to direct i/o and trying to avoid scalability issues related to filesystem / page cache.

      The following slabtop output shows the severity of memory consumed by the filesystem metadata:

      It consumes 5102368K(dentry) + 8501184K(inode) + 797752K(xfsmeta).

       Active / Total Objects (% used)    : 41794189 / 60179910 (69.4%)
       Active / Total Slabs (% used)      : 1378377 / 1378377 (100.0%)
       Active / Total Caches (% used)     : 68 / 100 (68.0%)
       Active / Total Size (% used)       : 10199466.90K / 13702587.58K (74.4%)
       Minimum / Average / Maximum Object : 0.01K / 0.23K / 16.69K
       
        OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
      26787432 9544323  35%    0.19K 637796       42   5102368K dentry
      13761856 13759786  99%    0.06K 215029       64    860116K kmalloc-64
      5630655 5630276  99%    0.08K 110405       51    441620K Acpi-State
      5610222 5609944  99%    1.06K 265662       30   8501184K xfs_inode
      5285107 5284903  99%    0.15K  99719       53    797752K xfs_ili
      374912 289450  77%    0.03K   2929      128     11716K kmalloc-32
      373248 211264  56%    0.02K   1458      256      5832K kmalloc-16
      318045 106955  33%    0.10K   8155       39     32620K buffer_head
      283458 145845  51%    0.38K   6749       42    107984K mnt_cache
      266000  39069  14%    0.50K   4165       64    133280K kmalloc-512
      250560  77845  31%    0.25K   3915       64     62640K kmalloc-256
      205800 205577  99%    0.07K   3675       56     14700K Acpi-Operand
      145887 142911  97%    0.57K   3656       56    116992K radix_tree_node
      145350 145350 100%    0.02K    855      170      3420K scsi_data_buffer
      114176 114176 100%    0.01K    223      512       892K kmalloc-8
      110292  32043  29%    0.09K   2626       42     10504K kmalloc-96
      
      

      Magma will report accurate disk usage of aggregate files consumed by magma per vbucket through kv-engine. Can we depend on these stats and avoid polling large number of file inodes from disk ?

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            We have merged improvements to number of files used by magma. The limit of files are limited to around < 300 files per vbucket. It can be tuned to lower if required with write amplification trade-off.

            Resolving this issue as we expect it would solve the issue.

            sarath Sarath Lakshman added a comment - We have merged improvements to number of files used by magma. The limit of files are limited to around < 300 files per vbucket. It can be tuned to lower if required with write amplification trade-off. Resolving this issue as we expect it would solve the issue.

            Now the files doesn't go beyond 300 for most of the vbuckets in the volume tests. whenever #files goes above 300 it comes down gradually...

            ritesh.agarwal Ritesh Agarwal added a comment - Now the files doesn't go beyond 300 for most of the vbuckets in the volume tests. whenever #files goes above 300 it comes down gradually...

            People

              sarath Sarath Lakshman
              sarath Sarath Lakshman
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty