Loading...

Details

Type: Bug
Resolution: Won't Fix
Priority: Critical
Fix Version/s: 6.0.1
Affects Version/s: 6.0.1
Component/s: secondary-index, storage-engine
Labels:
- plasma
- system-test
Environment:
component cluster

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.16.zip
https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.17.zip
https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.18.zip
https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.19.zip
https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.21.zip
https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.23.zip
https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.25.zip
https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.96.96.zip

Show
https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.16.zip https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.17.zip https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.18.zip https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.19.zip https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.21.zip https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.23.zip https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.104.25.zip https://s3.amazonaws.com/cb-customers/601_indexer_crash/collectinfo-2018-12-07T163829-ns_1%40172.23.96.96.zip
Epic Link:
Plasma RMS
Is this a Regression?:
Unknown
Sprint:
Storage-Sprint-Dec-21-2018

Description

Build : 6.0.1-1992
Test : -test tests/2i/test_idx_rebalance_replica_vulcan_kv_opt.yml -scope tests/2i/scope_idx_rebalance_replica_vulcan.yml
Scale: 3
Iteration : 2

In the 2nd iteration of this test, we have observed that the indexer service on 172.23.104.25 experienced OOM killings by the OS twice :

[ns_server:info,2018-12-06T22:57:13.488-08:00,babysitter_of_ns_1@127.0.0.1:indexer-goport<0.30503.9>:goport:handle_port_os_exit:458]Port exited with status 137

                                {[{<<"[goport(/opt/couchbase/bin/indexer)] 2018/12/06 22:57:13 child process exited with status 137\n">>,

[ns_server:info,2018-12-07T06:05:31.796-08:00,babysitter_of_ns_1@127.0.0.1:indexer-goport<0.16852.17>:goport:handle_port_os_exit:458]Port exited with status 137

                                {[{<<"[goport(/opt/couchbase/bin/indexer)] 2018/12/07 06:05:31 child process exited with status 137\n">>,

On checking the indexer logs as to when was the last time the memQuotaTuner tuned the quota to prevent OOM killings before the last OOM killing at 2018-12-07T06:05:31, I see that it was long before this time.

zgrep -i "freePercent" indexer.log* | grep -i newQuota

Last few times before the OOM killing:

indexer.log.3.gz:2018-12-07T03:34:44.063-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20517277696, freePercent: 13.062550501379903, currentQuota=4067237560, newQuota=4270599438

indexer.log.3.gz:2018-12-07T03:36:50.740-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20469211136, freePercent: 13.118506805216432, currentQuota=4270599438, newQuota=4473961316

indexer.log.3.gz:2018-12-07T03:57:32.498-08:00 [Info] Plasma: Adaptive memory quota tuning (decrementing): RSS:21414617088, freePercent:9.387591697781145, currentQuota=4473961316, newQuota=4067237560

After the OOM killing:

indexer.log.1.gz:2018-12-07T10:24:27.967-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20569579520, freePercent: 13.30386410089884, currentQuota=15455502729, newQuota=15658864607

indexer.log.1.gz:2018-12-07T10:38:08.117-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20637208576, freePercent: 13.005957959686338, currentQuota=15658864607, newQuota=15862226485

indexer.log.1.gz:2018-12-07T10:41:38.164-08:00 [Info] Plasma: Adaptive memory quota tuning (incrementing): RSS:20647723008, freePercent: 13.042174576163626, currentQuota=15862226485, newQuota=16065588363

The freePercent was around 5% for quite some time before the OOM killing.

indexer.log.2.gz:2018-12-07T06:04:05.033-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22460481536, freePercent:5.222436096024933, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:04:10.441-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22459219968, freePercent:5.226400347287988, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:04:15.896-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22461030400, freePercent:5.199694671083789, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:04:21.437-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22462640128, freePercent:5.199776240039819, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:04:26.665-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22465871872, freePercent:5.242991472944475, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:04:34.128-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22470647808, freePercent:5.1807869870760515, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:04:41.592-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473359360, freePercent:5.1523031076304004, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:04:47.011-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22475464704, freePercent:5.19566516465591, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:04:52.365-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22476578816, freePercent:5.212027897235514, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:04:58.068-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22478561280, freePercent:5.1558105727396875, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:05:04.042-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22474186752, freePercent:5.169024743616536, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:05:09.431-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473924608, freePercent:5.236531211626904, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:05:14.797-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473342976, freePercent:5.169530471143921, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:05:20.392-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22473113600, freePercent:5.20046141927047, currentQuota=4067237560

indexer.log.2.gz:2018-12-07T06:05:27.323-08:00 [Info] Plasma: Adaptive memory quota tuning RSS:22470930432, freePercent:5.1829240937240355, currentQuota=4067237560

Shouldnt have the memQuotaTuner run more often and tuned the memory quota to above the 10% freePercent threshold to avoid running into OOM killing ?

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

[System Test] Seeing OOM killing for indexer

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty