memcached is OOM-killed during data ingestion due to high heap fragmentation

Description

Build 5.5.0-2814.

Setup:

  • 2 KV nodes + 4 N1QL & GSI nodes

  • 32GB RAM, 20GB memory quota

  • 1 bucket, 1 replica, full ejection

Steps:

That step restores about 320M relatively small docs.

It looks like memcached RSS exceeds the memory quota more than I would normally expect.

Components

Affects versions

Fix versions

Environment

https://raw.githubusercontent.com/couchbase/perfrunner/master/clusters/oceanus.spec

Link to Log File, atop/blg, CBCollectInfo, Core dump

https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-cbcollect_info-8/172.23.96.5.zip https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-cbcollect_info-8/172.23.96.7.zip

Release Notes Description

None

Attachments

11

Activity

Show:

Pavel Paulau June 7, 2018 at 3:39 PM

Fair enough.

Daniel Owen June 7, 2018 at 3:36 PM

Given that fragmentation is known to be an issue and will be addressed in the future. Will use Won't fix for now.

Daniel Owen June 7, 2018 at 3:16 PM

Forgot to add, the data collection was on 6.0.0-1206

Therefore just confirming same results for build 5.5.0-2814

Default - Age 10 & Interval 10

 

Interval 10

Visited

13.2M

Moved

0

Max RSS

24.7G

Avg RSS

20G

Daniel Owen June 7, 2018 at 1:39 PM

Assigning to to close.

Daniel Owen June 7, 2018 at 1:37 PM

From the above results the RSS can only be significantly reduced when running very aggressively. This is not a suitable default setting.
The high RSS in the original run associated with the MB is believed to be do to a side-effect of numerous tasks running slowly - and on repeated runs we do not see the issue. Therefore will mark the issue as Not reproducible.

However it is worth noting that an improvement (targeted for Mad Hatter) is to consider only running defragmenter when RSS is high, we then can use more aggressive settings, see MB-29928.

Won't Fix
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Pavel Paulau

Reporter

Is this a Regression?

No

Triage

Untriaged

Operating System

Centos 64-bit

Due date

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created May 29, 2018 at 11:59 PM
Updated June 8, 2018 at 7:48 AM
Resolved June 7, 2018 at 3:36 PM
Instabug
Loading...