Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-38303

FTS - Scorch file merger leaks index snapshot ref counts on index closure

    XMLWordPrintable

    Details

    • Triage:
      Untriaged
    • Is this a Regression?:
      Unknown

      Description

      During the file merge introduction sequence, the introducer bumps the new index snapshot reference count to ensure it to live until the merger acknowledges the introductions and handles( decrements) the reference count for the newly introduced index snapshot.

      ref- https://github.com/blevesearch/bleve/blob/2f21902d034941f7d00332cbd70ead50da272448/index/scorch/introducer.go#L424

      Though the merger routine sends all the file merge introductions one at a time after each merge task completion, it handles all these merge introduction acknowledgements in a single loop with a "Bail Out" check on the index Close channel between every merge acknowledgement handling where the index snapshot reference count decrements happening.

      ref- https://github.com/blevesearch/bleve/blob/05d86ea8f6e30456949f612cf68cf4a27ce8c9c5/index/scorch/merge.go#L258-L261

      Now, this creates a potential reference count leak issue with an index closure on events like cbft rollbacks. 

      Let's say, 

      1. Merger totally had 6 merge tasks and it gave merge introduction requests for all of them to the introducer.
      2. The introducer did the introductions for all of them and bumped the index snapshot reference counts for all of them.
      3. Now the merger comes out of it merge loop and started processing the acknowledgements here - https://github.com/blevesearch/bleve/blob/05d86ea8f6e30456949f612cf68cf4a27ce8c9c5/index/scorch/merge.go#L253
      4. But by this time, an index closure happened due to the rollback. And without decrementing the reference counts of all these newly introduced index snapshots the merger exits here - https://github.com/blevesearch/bleve/blob/05d86ea8f6e30456949f612cf68cf4a27ce8c9c5/index/scorch/merge.go#L257 to create a reference count leak for all the merge introduced snapshots. 
      5. The files belonging to these index snapshots, will enter into the FD `DEL` orphaned state due to the forced index directory clean ups triggered from here - https://github.com/couchbase/cbft/blob/master/pindex_bleve_rollback.go#L40

      ie - files got deleted while there were live references to them by some index snapshots and those index snapshots will never get cleaned up due to the non-zero reference count resulted by the early exit loop of the merger.

       

       

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            girish.benakappa Girish Benakappa added a comment - - edited

            Update:

            Verifying file leak issue on 6.5.1-6281 - completed - did not see file leak issue.

            Steps used for verification:

            1. Cluster with kv,kv,kv,kv,search,search
            2. Create default bucket
            3. Create default index with 2 fields and replica count 1
            4. Load bucket with 100M docs each of size 50B
            5. Let index complete
            6. Now start loading another 200M new docs while mutating 100M docs loaded before.
            7. Parallel update index replica count to 0 and sleep for 5 mins and then update back to 1. <-- repeat this for every 5 mins
            8. Check for file leaks in parallel to above

            But its been seen that, having index defn update for every 5 mins, it slowed the performance of building index.

            Show
            girish.benakappa Girish Benakappa added a comment - - edited Update: Verifying file leak issue on 6.5.1-6281 - completed - did not see file leak issue. Steps used for verification: Cluster with kv,kv,kv,kv,search,search Create default bucket Create default index with 2 fields and replica count 1 Load bucket with 100M docs each of size 50B Let index complete Now start loading another 200M new docs while mutating 100M docs loaded before. Parallel update index replica count to 0 and sleep for 5 mins and then update back to 1. <-- repeat this for every 5 mins Check for file leaks in parallel to above But its been seen that, having index defn update for every 5 mins, it slowed the performance of building index.
            Hide
            ritam.sharma Ritam Sharma added a comment - - edited

            Girish Benakappa - what is the latest on this defect ?

            Best is to close this issue and create a new ticket for the perf ticket for CC.

            CC - Mihir Kamdar

            Show
            ritam.sharma Ritam Sharma added a comment - - edited Girish Benakappa  - what is the latest on this defect ? Best is to close this issue and create a new ticket for the perf ticket for CC. CC - Mihir Kamdar
            Hide
            mihir.kamdar Mihir Kamdar added a comment -

            Ritam Sharma this issue can be closed post verification on 7.0, which is blocked because of MB-38485. For 6.5.1, you can consider this as verified closed.

            Show
            mihir.kamdar Mihir Kamdar added a comment - Ritam Sharma this issue can be closed post verification on 7.0, which is blocked because of MB-38485 . For 6.5.1, you can consider this as verified closed.
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-1006.5.1-1125 contains bleve commit 172962a with commit message:
            [BP] MB-38303 - merger leaks index snapshot refCounts on index closure

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-1006.5.1-1125 contains bleve commit 172962a with commit message: [BP] MB-38303 - merger leaks index snapshot refCounts on index closure
            Hide
            evgeny.makarenko Evgeny Makarenko added a comment -

            Verified for build 7.0.0-4983

            Show
            evgeny.makarenko Evgeny Makarenko added a comment - Verified for build 7.0.0-4983

              People

              Assignee:
              evgeny.makarenko Evgeny Makarenko
              Reporter:
              Sreekanth Sivasankaran Sreekanth Sivasankaran
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty