Analysis:
gCtx is used mainly for recovery and collecting statistics.
During recovery, gCtx allocates memory needed for performing pageDelta Operations in recoverFromHeaderReplay.
Memory from unIndexed nodes go back to the reclaim List. There is no issue with the reclaim objects residing in wCtx
reclaim list as it gets destroyed in closeForRecovery explicitly.
During recoverFromDataReplay, the gCtx allocates memory for Marshalling or UnMarshalling via PageBuffers. The PageBuffers belong to the Shard's LSS Context. We were not releasing the sCtx after closeForRecovery in a specific case (recovery error happens during recoverFromDataReplay) and therefore the underlying PageBuffer.
closeForRecovery is invoked in the following cases :
a) recovery error
b) recovery success where we swap the old instance with the newly constructed instance
c) Indexer becomes Active and calls RecoveryDone (handleRecoveryDone INDEXER_ACTIVE).
In all the above cases, the closeForRecovery is invoked for a Plasma instance which is no longer needed.
Therefore, any memory associated with the gCtx should also be released. Otherwise, the memory can linger till the shard is finally closed and LSS cleanup is done. In addition, gCtx's sCtx caches LSSStats like FlushDataSz, which is used for computing LSSDataSize. On recovery failure, we should invalidate the cached LSS Stats. Because the Stats can get used due to a recycle during a trim.
Based on the above cases:
case a) the fix is to call release on the gCtx. This shall add the sCtx back to the FreeList, followed by subsequent trimSCtx, which will eventually reset the PageBuffer memory.
case b) we are good, because the gCtx is swapped and its unbind from sCtx after recovery is complete.
case c) we are good because s.recoveredInsts contain the swapped instance, for which we have already called closeForRecovery.
Analysis:
gCtx is used mainly for recovery and collecting statistics.
During recovery, gCtx allocates memory needed for performing pageDelta Operations in recoverFromHeaderReplay.
Memory from unIndexed nodes go back to the reclaim List. There is no issue with the reclaim objects residing in wCtx
reclaim list as it gets destroyed in closeForRecovery explicitly.
During recoverFromDataReplay, the gCtx allocates memory for Marshalling or UnMarshalling via PageBuffers. The PageBuffers belong to the Shard's LSS Context. We were not releasing the sCtx after closeForRecovery in a specific case (recovery error happens during recoverFromDataReplay) and therefore the underlying PageBuffer.
closeForRecovery is invoked in the following cases :
a) recovery error
b) recovery success where we swap the old instance with the newly constructed instance
c) Indexer becomes Active and calls RecoveryDone (handleRecoveryDone INDEXER_ACTIVE).
In all the above cases, the closeForRecovery is invoked for a Plasma instance which is no longer needed.
Therefore, any memory associated with the gCtx should also be released. Otherwise, the memory can linger till the shard is finally closed and LSS cleanup is done. In addition, gCtx's sCtx caches LSSStats like FlushDataSz, which is used for computing LSSDataSize. On recovery failure, we should invalidate the cached LSS Stats. Because the Stats can get used due to a recycle during a trim.
Based on the above cases:
case a) the fix is to call release on the gCtx. This shall add the sCtx back to the FreeList, followed by subsequent trimSCtx, which will eventually reset the PageBuffer memory.
case b) we are good, because the gCtx is swapped and its unbind from sCtx after recovery is complete.
case c) we are good because s.recoveredInsts contain the swapped instance, for which we have already called closeForRecovery.