Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46742

closeForRecovery does not release gCtx

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Attachments

      Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Analysis:
           
          gCtx is used mainly for recovery and  collecting statistics.
           
          During recovery, gCtx allocates memory needed for performing pageDelta Operations in recoverFromHeaderReplay.
          Memory from unIndexed nodes go back to the reclaim List. There is no issue with the reclaim objects residing in wCtx
          reclaim list as it gets destroyed in closeForRecovery explicitly.
           
          During recoverFromDataReplay, the gCtx allocates memory for Marshalling or UnMarshalling via PageBuffers. The PageBuffers belong to  the Shard's LSS Context. We were not releasing the sCtx after  closeForRecovery in a specific case (recovery error happens during  recoverFromDataReplay) and therefore the underlying PageBuffer.
           
          closeForRecovery is invoked in the following cases :
                 a) recovery error
                 b) recovery success where we swap the old instance with the newly constructed instance
                 c) Indexer becomes Active and calls RecoveryDone (handleRecoveryDone INDEXER_ACTIVE).
           
          In all the above cases, the closeForRecovery is invoked for a Plasma instance which is no longer needed.
           
          Therefore, any memory associated with the gCtx should also be released. Otherwise, the memory can linger till the shard is finally closed and LSS cleanup is done.  In addition, gCtx's sCtx caches LSSStats like FlushDataSz, which is used for computing LSSDataSize. On recovery failure, we should invalidate the cached LSS Stats. Because the Stats can get used due to a recycle during a trim.
           
          Based on the above cases: 
          case a) the fix is to call release on the gCtx. This shall add the sCtx back to the FreeList, followed by subsequent trimSCtx, which will eventually reset the PageBuffer memory.
          case b) we are good, because the gCtx is swapped and its unbind from sCtx after recovery is complete. 
          case c) we are good because s.recoveredInsts contain the swapped instance, for which we have already called closeForRecovery. 
           
           

          saptarshi.sen Saptarshi Sen added a comment - Analysis:   gCtx is used mainly for recovery and  collecting statistics.   During recovery, gCtx allocates memory needed for performing pageDelta Operations in recoverFromHeaderReplay. Memory from unIndexed nodes go back to the reclaim List. There is no issue with the reclaim objects residing in wCtx reclaim list as it gets destroyed in closeForRecovery explicitly.   During recoverFromDataReplay, the gCtx allocates memory for Marshalling or UnMarshalling via PageBuffers. The PageBuffers belong to  the Shard's LSS Context. We were not releasing the sCtx after  closeForRecovery in a specific case (recovery error happens during  recoverFromDataReplay) and therefore the underlying PageBuffer.   closeForRecovery is invoked in the following cases :        a) recovery error        b) recovery success where we swap the old instance with the newly constructed instance        c) Indexer becomes Active and calls RecoveryDone (handleRecoveryDone INDEXER_ACTIVE).   In all the above cases, the closeForRecovery is invoked for a Plasma instance which is no longer needed.   Therefore, any memory associated with the gCtx should also be released. Otherwise, the memory can linger till the shard is finally closed and LSS cleanup is done.  In addition, gCtx's sCtx caches LSSStats like FlushDataSz, which is used for computing LSSDataSize. On recovery failure, we should invalidate the cached LSS Stats. Because the Stats can get used due to a recycle during a trim.   Based on the above cases:  case a) the fix is to call release on the gCtx. This shall add the sCtx back to the FreeList, followed by subsequent trimSCtx, which will eventually reset the PageBuffer memory. case b) we are good, because the gCtx is swapped and its unbind from sCtx after recovery is complete.  case c) we are good because s.recoveredInsts contain the swapped instance, for which we have already called closeForRecovery.     

          Build couchbase-server-7.1.0-1049 contains plasma commit a712678 with commit message:
          MB-46742: Release gCtx's sCtx & Buffer Memory on closeForRecovery

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-1049 contains plasma commit a712678 with commit message: MB-46742 : Release gCtx's sCtx & Buffer Memory on closeForRecovery

          Unit test verifies this fix

          srinath.duvuru Srinath Duvuru added a comment - Unit test verifies this fix

          People

            saptarshi.sen Saptarshi Sen
            jliang John Liang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty