Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6860

[system test] Index file descriptor leaks

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0
    • Component/s: view-engine
    • Security Level: Public
    • Labels:
    • Environment:
      centos 6.2 64bit build 2.0.0-1808

      Description

      Create a 8 nodes cluster installed with couchbase server 2.0.0-1808. Consistent view is disable.
      Each node has 14 GB RAM and 2 ebs volumes, one for /data and another for /view
      Create 2 bucket and load 9 million items to each bucket.
      Create 3 doc, one for default bucket and 2 for saslbucket.

      Let cluster running with load ~ 18K ops for each bucket in more than one day.
      Check view directory, I see 2 nodes with disk usage more than 20%

      Thuans-MacBook-Pro:testrunner thuan$ python scripts/ssh.py -i ../ini/8-ec2-orange.ini "df -kh /view"
      ec2-50-112-210-248.us-west-2.compute.amazonaws.com
      Filesystem Size Used Avail Use% Mounted on
      /dev/xvdj 247G 60G 175G 26% /view

      ec2-50-112-46-220.us-west-2.compute.amazonaws.com
      Filesystem Size Used Avail Use% Mounted on
      /dev/xvdj 247G 4.4G 230G 2% /view

      ec2-54-245-38-16.us-west-2.compute.amazonaws.com
      Filesystem Size Used Avail Use% Mounted on
      /dev/xvdj 247G 28G 207G 12% /view

      ec2-50-112-52-162.us-west-2.compute.amazonaws.com
      Filesystem Size Used Avail Use% Mounted on
      /dev/xvdj 247G 45G 189G 20% /view

      ec2-50-112-17-129.us-west-2.compute.amazonaws.com
      Filesystem Size Used Avail Use% Mounted on
      /dev/xvdj 247G 27G 208G 12% /view

      ec2-54-245-55-107.us-west-2.compute.amazonaws.com
      Filesystem Size Used Avail Use% Mounted on
      /dev/xvdj 247G 7.4G 227G 4% /view

      ec2-54-245-24-204.us-west-2.compute.amazonaws.com
      Filesystem Size Used Avail Use% Mounted on
      /dev/xvdj 247G 9.4G 225G 4% /view

      ec2-50-112-86-218.us-west-2.compute.amazonaws.com
      Filesystem Size Used Avail Use% Mounted on
      /dev/xvdj 247G 13G 221G 6% /view

        • Go to ec2-50-112-210-248.us-west-2.compute.amazonaws.com node, I see actual file size for all index files around 2.8GB

      [root@ip-10-249-0-36 view]# du -hs
      3.0G .

      [root@ip-10-249-0-36 view]# df -kh | grep view
      /dev/xvdj 247G 60G 175G 26% /view

        • Do lsof +L1, see beam.smp is holding many delete files make them not to be deleted.

      [root@ip-10-249-0-36 view]# lsof +L1 | grep view
      beam.smp 18926 couchbase 53u REG 202,144 39 0 15859715 /view/.delete/0ed16b72a6e2e1d043b59ba006f32828 (deleted)
      beam.smp 18926 couchbase 55u REG 202,144 39 0 15073283 /view/.delete/2d6e9162017b08fa0cb8d5aadaef4311 (deleted)
      beam.smp 18926 couchbase 56r REG 202,144 39 0 15073283 /view/.delete/2d6e9162017b08fa0cb8d5aadaef4311 (deleted)
      beam.smp 18926 couchbase 57w REG 202,144 39 0 15073283 /view/.delete/2d6e9162017b08fa0cb8d5aadaef4311 (deleted)
      beam.smp 18926 couchbase 59r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 64u REG 202,144 152048674 0 14417947 /view/.delete/85177d34a8fbdd8e851ca37329356a72 (deleted)
      beam.smp 18926 couchbase 66r REG 202,144 39 0 15859715 /view/.delete/0ed16b72a6e2e1d043b59ba006f32828 (deleted)
      beam.smp 18926 couchbase 79w REG 202,144 39 0 15859715 /view/.delete/0ed16b72a6e2e1d043b59ba006f32828 (deleted)
      beam.smp 18926 couchbase 88r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 95r REG 202,144 25548094584 0 14417926 /view/.delete/2d5d22317781e50b64dde53e74ca8a01 (deleted)
      beam.smp 18926 couchbase 105w REG 202,144 0 0 14417929 /view/@indexes/default/replica_87d0cc9a8fffc2e1e434f6ddbb0c168d.view.log (deleted)
      beam.smp 18926 couchbase 113u REG 202,144 152048674 0 14417947 /view/.delete/85177d34a8fbdd8e851ca37329356a72 (deleted)
      beam.smp 18926 couchbase 121r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 136r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 138r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 144r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 155r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 164r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 178r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 187r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 194r REG 202,144 7679604520 0 14417924 /view/.delete/fa9cd11ed6b0f873c825fba96ee44c94 (deleted)
      beam.smp 18926 couchbase 195r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 196r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 205r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 213r REG 202,144 4087670002 0 14417931 /view/.delete/a7418018f1977c4a4c614ad801ac8add (deleted)
      beam.smp 18926 couchbase 231r REG 202,144 22818230272 0 14417927 /view/.delete/319125a97816c48c70500af867ddae5b (deleted)
      beam.smp 18926 couchbase 263w REG 202,144 0 0 14417935 /view/@indexes/default/replica_87d0cc9a8fffc2e1e434f6ddbb0c168d.view.log (deleted)
      beam.smp 18926 couchbase 278r REG 202,144 22818230272 0 14417927 /view/.delete/319125a97816c48c70500af867ddae5b (deleted)
      beam.smp 18926 couchbase 334w REG 202,144 0 0 14417936 /view/@indexes/default/replica_87d0cc9a8fffc2e1e434f6ddbb0c168d.view.log (deleted)
      beam.smp 18926 couchbase 374r REG 202,144 22818230272 0 14417927 /view/.delete/319125a97816c48c70500af867ddae5b (deleted)
      beam.smp 18926 couchbase 384r REG 202,144 22818230272 0 14417927 /view/.delete/319125a97816c48c70500af867ddae5b (deleted)
      [root@ip-10-249-0-36 view]#

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        kzeller kzeller added a comment -

        Ok added: <para>
        For geo/spatial indexes, after updating a design document, or deleting a design document,
        the old index files and erlang processes were not released. This
        unnecessarily took disk space and resulted in leaking file descriptors.
        After database shard compaction, spatial/geo indexes would
        never release the file handle of the pre-compaction database files.
        This meant that disk space couldn't be reclaimed by the OS. This has
        now been fixed.
        </para>
        <para>
        For general indexes, after index compaction the pre-compaction index
        files were deleted but were somtimes held open for a long time.
        This prevented the OS from reclaiming the respective disk
        space and leaking one file descriptor per index compaction.
        This has been fixed.
        </para>
        <para>
        For both geo/spatial and general indexes,
        we now avoid creating unnecessary empty index files and now
        avoid keeping them open for
        very long periods, such as waiting until bucket deletion.
        This is a more minor fix which helps decrease the number of open
        file descriptors, which is important if you
        are wroking on an operating sytem with a small limit of max
        allowed file descriptors, such as Windows and Mac OS X.
        </para>

        Show
        kzeller kzeller added a comment - Ok added: <para> For geo/spatial indexes, after updating a design document, or deleting a design document, the old index files and erlang processes were not released. This unnecessarily took disk space and resulted in leaking file descriptors. After database shard compaction, spatial/geo indexes would never release the file handle of the pre-compaction database files. This meant that disk space couldn't be reclaimed by the OS. This has now been fixed. </para> <para> For general indexes, after index compaction the pre-compaction index files were deleted but were somtimes held open for a long time. This prevented the OS from reclaiming the respective disk space and leaking one file descriptor per index compaction. This has been fixed. </para> <para> For both geo/spatial and general indexes, we now avoid creating unnecessary empty index files and now avoid keeping them open for very long periods, such as waiting until bucket deletion. This is a more minor fix which helps decrease the number of open file descriptors, which is important if you are wroking on an operating sytem with a small limit of max allowed file descriptors, such as Windows and Mac OS X. </para>
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Karen:

        It would read more like:

        For geo/spatial indexes:

        1) After updating a design document, or deleting a design document,
        the old index files and erlang processes were never released (stealing
        disk space and leaking file descriptors);
        2) After database (vbucket) compaction, spatial/geo indexes would
        never release the file handle of the pre-compaction database files
        (meaning that disk space couldn't be reclaimed by the OS)

        For mapreduce views:

        1) In some cases, after index compaction, the pre-compaction index
        files were deleted but held open for a long time (or even forever at
        the extreme), preventing the OS from reclaiming the respective disk
        space and leaking 1 file descriptor per index compaction.

        Both for geo and mapreduce (minor issue):

        1) Avoid creating unnecessary empty index files and keep them open for
        very long periods (until bucket deletion). This is a minor one, as it
        didn't steal disk space - but it helps decreasing the number of open
        file descriptors, which is important on OSes with a small limit of max
        allowed file descriptors (Windows and Mac OS X).

        It's a lot of stuff, but none relates to index files never being
        deleted after bucket deletion.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Karen: It would read more like: For geo/spatial indexes: 1) After updating a design document, or deleting a design document, the old index files and erlang processes were never released (stealing disk space and leaking file descriptors); 2) After database (vbucket) compaction, spatial/geo indexes would never release the file handle of the pre-compaction database files (meaning that disk space couldn't be reclaimed by the OS) For mapreduce views: 1) In some cases, after index compaction, the pre-compaction index files were deleted but held open for a long time (or even forever at the extreme), preventing the OS from reclaiming the respective disk space and leaking 1 file descriptor per index compaction. Both for geo and mapreduce (minor issue): 1) Avoid creating unnecessary empty index files and keep them open for very long periods (until bucket deletion). This is a minor one, as it didn't steal disk space - but it helps decreasing the number of open file descriptors, which is important on OSes with a small limit of max allowed file descriptors (Windows and Mac OS X). It's a lot of stuff, but none relates to index files never being deleted after bucket deletion.
        Hide
        kzeller kzeller added a comment -

        So this should really read: "memory leaks had occurred due to open, unused index files. Now, unused index files are now removed and the memory leaks resolved"?

        Show
        kzeller kzeller added a comment - So this should really read: "memory leaks had occurred due to open, unused index files. Now, unused index files are now removed and the memory leaks resolved"?
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Note Karen: different kinds of leaks were fixed, but none relates to your observation.
        The leaks were related to not closing index or database file handles after compaction in some scenarios. Other leaks were related to open (and keep them open) unnecessary/unused files.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Note Karen: different kinds of leaks were fixed, but none relates to your observation. The leaks were related to not closing index or database file handles after compaction in some scenarios. Other leaks were related to open (and keep them open) unnecessary/unused files.
        Hide
        kzeller kzeller added a comment -

        RN: "For past releases, after a data bucket had been deleted,
        any indexes associated with the bucket were not deleted. This
        has been fixed so the both the data bucket and associated indexes
        are deleted."

        Show
        kzeller kzeller added a comment - RN: "For past releases, after a data bucket had been deleted, any indexes associated with the bucket were not deleted. This has been fixed so the both the data bucket and associated indexes are deleted."

          People

          • Assignee:
            FilipeManana Filipe Manana (Inactive)
            Reporter:
            thuan Thuan Nguyen
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes