Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49811

Magma memory allocation likely contributing to 'high' fragmentation

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Raising this issue after analysing some memory stats from another MB, this is by no way 100% conclusive analysis but worthy of logging as an issue.

      Logs from MB-49702 suggest that the memory allocated to magma is contributing to high fragmentation.

      Here are my observations that lead to that conclusion, this is all from the following node.

      • s3://cb-customers-secure/rebalance_failed/2021-11-22/collectinfo-2021-11-22t083450-ns_1@172.23.121.123.zip

      In MB-49702 DaveR noted that the NonIO thread(s) were heavily utilised due to the similar issues logged in MB-49525.
      However this is a much more heavily utilised node so I didn't expect to see the DefragmenterTask still continually trying
      to reduce the bucket's resident memory.

      The interesting points from stats.log memory are:

      • 5.42 GiB resident ep_arena:resident: 5821845504
      • 4.54 GiB bucket allocated ep_arena:allocated: 4878246744
      • 0.94 GiB of unallocated bytes ep_arena:fragmentation_size: 943598760
      • This yields a fragmentation 'ratio' of 0.16 or 16% fragmented

      The DefragmentTask finally calculates a 'score' of 0.10 when it takes the 0.16 and multiples by allocated as a ratio of high-water mark, I.e. the DefragementerTask is above the lower threshold so will reduce its sleep time. This happened continually and the DefragmenterTask has reduced sleep to 0 (i.e. constantly re-scheduled). This results in a very high-rates of visiting to the bucket HashTable and from mortimer we can observe that the majority of StoredValues have been visited and reallocated.

      • 4.2m StoredValues (the things which get reallocated) ep_storedval_num: 4209311
      • DefragmenterTask reached reallocation rates of ~170k StoredValue's a minute (and this goes on for a while)
      • No new mutations are arriving
      • Conclusion is that we must of reallocated everything many times (see mortimer graph) and the HashTable should be quite well packed/utilised.

      So with that, why do we still have quite a high level of fragmentation?

      KV-engine's mem_used seems to mostly be HashTable data, I don't see that KV has say lots of overheads at play

      • 3.52 GiB KV-engine's mem_used ep_mem_used_primary: 3780091208
      • 3.88 GiB HashTable memory ep_kv_size: 4165293460
      • Note that the memory stats are not a snapshot, there can be skew as we read and process each stat, this is why I think ep_mem_used_primary is showing less than the HashTable data

      The next biggest user of memory is magma, we see that in the secondary mem_used

      • 1.09 GiB allocated by magma ep_mem_used_secondary: 1093367712

      So my conclusion comes from

      • KV has 3.88 GiB of memory used by HashTable and we are actively 'repacking' that data, as that's what the DegramenterTask does
      • Magma has 1.09 GiB of memory, with no active defragging

      The 1.09 GiB must be the main contributor to the 0.94 GiB of fragmentation

      Mortimer graph showing overall memory stays stable

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-7.1.0-2036 contains magma commit 4deba23 with commit message:
            MB-49811 magma: Avoid shared_ptr allocation in BlockCache

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2036 contains magma commit 4deba23 with commit message: MB-49811 magma: Avoid shared_ptr allocation in BlockCache

            Build couchbase-server-7.1.0-2036 contains magma commit 59e7c0e with commit message:
            MB-49811 magma: Share Object directly to users of Cache

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2036 contains magma commit 59e7c0e with commit message: MB-49811 magma: Share Object directly to users of Cache
            rohan.suri Rohan Suri added a comment - - edited

            The ideal solution to this requires a memory defragmenter. That is proposed for Morpheus release. To mitigate the issue for Neo, we've done the following:

            • Reduced allocations.
            • Added knob to turn block cache on/off.

             

            Following objects will no longer be allocated. This should reduce fragmentation in their respective bins.

            Object Bin
            LRU list node 32
            Group list node 32
            BlockData's shared_ptr control block 32
            ObjectMap node entry 80

             

            The knob to turn on/off block cache helps in case a customer sees fragmentation in an unforeseen situation. There shouldn't be any impact of turning off block cache if sufficient free memory is available for Linux page cache to serve those blocks. If there isn't, then extra read IOs would incur.

            We ran some tests with block cache turned off and saw no regressions (results here, https://hub.internal.couchbase.com/confluence/display/~rohan.suri/MB-49811+turn+off+block+cache) Note those runs had plenty memory for page cache. We plan to do a weekly run with block cache turned off to see if any test has a regression. If none, then we require an appropriate test that demonstrates usefulness of block cache.

            rohan.suri Rohan Suri added a comment - - edited The ideal solution to this requires a memory defragmenter. That is proposed for Morpheus release. To mitigate the issue for Neo, we've done the following: Reduced allocations. Added knob to turn block cache on/off.   Following objects will no longer be allocated. This should reduce fragmentation in their respective bins. Object Bin LRU list node 32 Group list node 32 BlockData's shared_ptr control block 32 ObjectMap node entry 80   The knob to turn on/off block cache helps in case a customer sees fragmentation in an unforeseen situation. There shouldn't be any impact of turning off block cache if sufficient free memory is available for Linux page cache to serve those blocks. If there isn't, then extra read IOs would incur. We ran some tests with block cache turned off and saw no regressions (results here, https://hub.internal.couchbase.com/confluence/display/~rohan.suri/MB-49811+turn+off+block+cache)  Note those runs had plenty memory for page cache. We plan to do a weekly run with block cache turned off to see if any test has a regression. If none, then we require an appropriate test that demonstrates usefulness of block cache.

            Build couchbase-server-7.1.0-2041 contains kv_engine commit 72d3c06 with commit message:
            MB-49811 Make magma_enable_block_cache a dynamic config

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2041 contains kv_engine commit 72d3c06 with commit message: MB-49811 Make magma_enable_block_cache a dynamic config

            Hand testing to verify the fixes were done. No further test needed to verify.

            srinath.duvuru Srinath Duvuru added a comment - Hand testing to verify the fixes were done. No further test needed to verify.

            People

              rohan.suri Rohan Suri
              jwalker Jim Walker
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty