Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32281

[System Test] Seeing OOM killing for indexer

    XMLWordPrintable

Details

    Description

      Build : 6.0.1-1992
      Test : -test tests/2i/test_idx_rebalance_replica_vulcan_kv_opt.yml -scope tests/2i/scope_idx_rebalance_replica_vulcan.yml
      Scale: 3
      Iteration : 2

      In the 2nd iteration of this test, we have observed that the indexer service on 172.23.104.25 experienced OOM killings by the OS twice :

      [ns_server:info,2018-12-06T22:57:13.488-08:00,babysitter_of_ns_1@127.0.0.1:indexer-goport<0.30503.9>:goport:handle_port_os_exit:458]Port exited with status 137
                                      {[{<<"[goport(/opt/couchbase/bin/indexer)] 2018/12/06 22:57:13 child process exited with status 137\n">>,
      [ns_server:info,2018-12-07T06:05:31.796-08:00,babysitter_of_ns_1@127.0.0.1:indexer-goport<0.16852.17>:goport:handle_port_os_exit:458]Port exited with status 137
                                      {[{<<"[goport(/opt/couchbase/bin/indexer)] 2018/12/07 06:05:31 child process exited with status 137\n">>,
      

      On checking the indexer logs as to when was the last time the memQuotaTuner tuned the quota to prevent OOM killings before the last OOM killing at 2018-12-07T06:05:31, I see that it was long before this time.

      zgrep -i "freePercent" indexer.log* | grep -i newQuota

      Last few times before the OOM killing:

      indexer.log.3.gz:2018-12-07T03:34:44.063-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20517277696, freePercent: 13.062550501379903, currentQuota=4067237560, newQuota=4270599438
      indexer.log.3.gz:2018-12-07T03:36:50.740-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20469211136, freePercent: 13.118506805216432, currentQuota=4270599438, newQuota=4473961316
      indexer.log.3.gz:2018-12-07T03:57:32.498-08:00 [Info] Plasma: Adaptive memory quota tuning (decrementing): RSS:21414617088, freePercent:9.387591697781145, currentQuota=4473961316, newQuota=4067237560
      

      After the OOM killing:

      indexer.log.1.gz:2018-12-07T10:24:27.967-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20569579520, freePercent: 13.30386410089884, currentQuota=15455502729, newQuota=15658864607
      indexer.log.1.gz:2018-12-07T10:38:08.117-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20637208576, freePercent: 13.005957959686338, currentQuota=15658864607, newQuota=15862226485
      indexer.log.1.gz:2018-12-07T10:41:38.164-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20647723008, freePercent: 13.042174576163626, currentQuota=15862226485, newQuota=16065588363
      

      The freePercent was around 5% for quite some time before the OOM killing.

      indexer.log.2.gz:2018-12-07T06:04:05.033-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22460481536, freePercent:5.222436096024933, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:04:10.441-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22459219968, freePercent:5.226400347287988, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:04:15.896-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22461030400, freePercent:5.199694671083789, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:04:21.437-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22462640128, freePercent:5.199776240039819, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:04:26.665-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22465871872, freePercent:5.242991472944475, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:04:34.128-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22470647808, freePercent:5.1807869870760515, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:04:41.592-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473359360, freePercent:5.1523031076304004, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:04:47.011-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22475464704, freePercent:5.19566516465591, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:04:52.365-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22476578816, freePercent:5.212027897235514, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:04:58.068-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22478561280, freePercent:5.1558105727396875, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:05:04.042-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22474186752, freePercent:5.169024743616536, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:05:09.431-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473924608, freePercent:5.236531211626904, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:05:14.797-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473342976, freePercent:5.169530471143921, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:05:20.392-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473113600, freePercent:5.20046141927047, currentQuota=4067237560
      indexer.log.2.gz:2018-12-07T06:05:27.323-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22470930432, freePercent:5.1829240937240355, currentQuota=4067237560
      

      Shouldnt have the memQuotaTuner run more often and tuned the memory quota to above the 10% freePercent threshold to avoid running into OOM killing ?

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Hi Mihir, the algorithm has some limits to ensure that memory quota is not tuned by more than 50% here.

          MTunerMinQuotaRatio   = 0.5  // Don't reduce quota beyond 50% of user value 

          The reasoning I believe, is that if memory quota has to be tuned beyond 50% of user's value to meet memory demands then the node is incorrectly sized.

          In your env, the reason memory tuner stopped tuning down is because it had already brought it down by over 50% of original value.

           

          sundar Sundar Sridharan (Inactive) added a comment - Hi Mihir, the algorithm has some limits to ensure that memory quota is not tuned by more than 50% here . MTunerMinQuotaRatio = 0.5 // Don't reduce quota beyond 50% of user value The reasoning I believe, is that if memory quota has to be tuned beyond 50% of user's value to meet memory demands then the node is incorrectly sized. In your env, the reason memory tuner stopped tuning down is because it had already brought it down by over 50% of original value.  

          Thanks Sundar Sridharan. Can we close this bug then ?

          mihir.kamdar Mihir Kamdar (Inactive) added a comment - Thanks Sundar Sridharan . Can we close this bug then ?

          Sure Mihir, closing this as won't fix as per the original intent of the algorithm. Please reopen this bug if we need to revisit that.

          sundar Sundar Sridharan (Inactive) added a comment - Sure Mihir, closing this as won't fix as per the original intent of the algorithm. Please reopen this bug if we need to revisit that.

          People

            sundar Sundar Sridharan (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty