Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39344

[checkpoint] Allocation in replica checkpoints can push the system to hard OOM

    XMLWordPrintable

Details

    Description

      Problem

      There are multiple scenarios where replica checkpoints might allocate most of the memory on a node in a state where that memory is not releasable. That can result in hard OOM and possible deadlock in scenarios like rebalance or bulk load.

      CBSE-8284 is an example of livelock at rebalance. That shows that without on-going mutations we can end-up with replica disk checkpoint being stuck in the open state, which means that we cannot recover all the memory associated with them.

      While those scenarios are uncommon on on-premise envs, the system breaks quite quickly on many, small bucket envs if someone attempts simple loads with (eg) low memory quotas and bigger-than-usual doc sizes.

      Original proposal

      Due to the (current) invariant / assumption there’s always one open checkpoint - hence cannot close the last one (even though we have the last marker) as we don’t know what the seqnos for the next checkpoint are going to be.

      If we relaxed that for replicas (which I think makes sense given they are essentially slaved to the active) then we could close the checkpoint as soon as the last mutation arrives - and hence remove that checkpoint once it’s unreferenced.

      This only works for disk checkpoints as we need to know checkpoint ends not snap ends.

      Final proposal

      Force-closing the open checkpoint at replica comes with its own issues, see historical conversation in comments for details.

      In the end we solve by allowing ItemExpel to remove all the mutations in checkpoints.
      Note that, differently from the original proposal, the ItemExpel fix is wider-scoped and isn't restricted to Disk Checkpoints. So that improves our memory-recovery ability on Memory Checkpoints too and any similar issue caused by those.

       

       

      Issue Resolution
      The last item in a replica checkpoint was not expelled. In scenarios such as large average item size, high numbers of replicas or low Bucket quota could result in a data-node entering an unrecoverable Out-of-Memory state. ItemExpel has been enhanced to release all the items in a checkpoint when memory conditions allow.

      Attachments

        1. MB-39344_oom.png
          MB-39344_oom.png
          435 kB
        2. MB-39344_rebin_oom.png
          MB-39344_rebin_oom.png
          405 kB
        3. MB-39344_rebin_success.png
          MB-39344_rebin_success.png
          451 kB
        4. MB-39344_success.png
          MB-39344_success.png
          539 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              owend Daniel Owen
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty