Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Won't Fix
-
6.0.1
-
component cluster
-
Untriaged
-
-
Unknown
-
Storage-Sprint-Dec-21-2018
Description
Build : 6.0.1-1992
Test : -test tests/2i/test_idx_rebalance_replica_vulcan_kv_opt.yml -scope tests/2i/scope_idx_rebalance_replica_vulcan.yml
Scale: 3
Iteration : 2
In the 2nd iteration of this test, we have observed that the indexer service on 172.23.104.25 experienced OOM killings by the OS twice :
[ns_server:info,2018-12-06T22:57:13.488-08:00,babysitter_of_ns_1@127.0.0.1:indexer-goport<0.30503.9>:goport:handle_port_os_exit:458]Port exited with status 137
|
{[{<<"[goport(/opt/couchbase/bin/indexer)] 2018/12/06 22:57:13 child process exited with status 137\n">>,
|
[ns_server:info,2018-12-07T06:05:31.796-08:00,babysitter_of_ns_1@127.0.0.1:indexer-goport<0.16852.17>:goport:handle_port_os_exit:458]Port exited with status 137
|
{[{<<"[goport(/opt/couchbase/bin/indexer)] 2018/12/07 06:05:31 child process exited with status 137\n">>,
|
On checking the indexer logs as to when was the last time the memQuotaTuner tuned the quota to prevent OOM killings before the last OOM killing at 2018-12-07T06:05:31, I see that it was long before this time.
zgrep -i "freePercent" indexer.log* | grep -i newQuota
Last few times before the OOM killing:
indexer.log.3.gz:2018-12-07T03:34:44.063-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20517277696, freePercent: 13.062550501379903, currentQuota=4067237560, newQuota=4270599438
|
indexer.log.3.gz:2018-12-07T03:36:50.740-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20469211136, freePercent: 13.118506805216432, currentQuota=4270599438, newQuota=4473961316
|
indexer.log.3.gz:2018-12-07T03:57:32.498-08:00 [Info] Plasma: Adaptive memory quota tuning (decrementing): RSS:21414617088, freePercent:9.387591697781145, currentQuota=4473961316, newQuota=4067237560
|
After the OOM killing:
indexer.log.1.gz:2018-12-07T10:24:27.967-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20569579520, freePercent: 13.30386410089884, currentQuota=15455502729, newQuota=15658864607
|
indexer.log.1.gz:2018-12-07T10:38:08.117-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20637208576, freePercent: 13.005957959686338, currentQuota=15658864607, newQuota=15862226485
|
indexer.log.1.gz:2018-12-07T10:41:38.164-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20647723008, freePercent: 13.042174576163626, currentQuota=15862226485, newQuota=16065588363
|
The freePercent was around 5% for quite some time before the OOM killing.
indexer.log.2.gz:2018-12-07T06:04:05.033-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22460481536, freePercent:5.222436096024933, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:04:10.441-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22459219968, freePercent:5.226400347287988, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:04:15.896-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22461030400, freePercent:5.199694671083789, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:04:21.437-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22462640128, freePercent:5.199776240039819, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:04:26.665-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22465871872, freePercent:5.242991472944475, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:04:34.128-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22470647808, freePercent:5.1807869870760515, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:04:41.592-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473359360, freePercent:5.1523031076304004, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:04:47.011-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22475464704, freePercent:5.19566516465591, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:04:52.365-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22476578816, freePercent:5.212027897235514, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:04:58.068-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22478561280, freePercent:5.1558105727396875, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:05:04.042-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22474186752, freePercent:5.169024743616536, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:05:09.431-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473924608, freePercent:5.236531211626904, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:05:14.797-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473342976, freePercent:5.169530471143921, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:05:20.392-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473113600, freePercent:5.20046141927047, currentQuota=4067237560
|
indexer.log.2.gz:2018-12-07T06:05:27.323-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22470930432, freePercent:5.1829240937240355, currentQuota=4067237560
|
Shouldnt have the memQuotaTuner run more often and tuned the memory quota to above the 10% freePercent threshold to avoid running into OOM killing ?
Hi Mihir, the algorithm has some limits to ensure that memory quota is not tuned by more than 50% here.
MTunerMinQuotaRatio = 0.5 // Don't reduce quota beyond 50% of user value
The reasoning I believe, is that if memory quota has to be tuned beyond 50% of user's value to meet memory demands then the node is incorrectly sized.
In your env, the reason memory tuner stopped tuning down is because it had already brought it down by over 50% of original value.