memcached is OOM-killed during data ingestion due to high heap fragmentation

Description

Build 5.5.0-2814.

Setup:

2 KV nodes + 4 N1QL & GSI nodes
32GB RAM, 20GB memory quota
1 bucket, 1 replica, full ejection

Steps:

That step restores about 320M relatively small docs.

It looks like memcached RSS exceeds the memory quota more than I would normally expect.

Components

Environment

https://raw.githubusercontent.com/couchbase/perfrunner/master/clusters/oceanus.spec

Link to Log File, atop/blg, CBCollectInfo, Core dump

https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-cbcollect_info-8/172.23.96.5.zip https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-cbcollect_info-8/172.23.96.7.zip

Release Notes Description

None

Attachments

Linked issues

is triggering

MB-30017

High vb_replica_checkpoint_memory when handling slow streams on the Producer during data ingestion

relates to

MB-29928

Defragmenter: Only run when fragmentation is high

MB-15009

Defragment document key+meta (i.e. StoredValue) in addition to Blobs

Activity

Show:

Pavel Paulau June 7, 2018 at 3:39 PM

Fair enough.

Daniel Owen June 7, 2018 at 3:36 PM

Given that fragmentation is known to be an issue and will be addressed in the future. Will use Won't fix for now.

Daniel Owen June 7, 2018 at 3:16 PM

Forgot to add, the data collection was on 6.0.0-1206

Therefore just confirming same results for build 5.5.0-2814

Default - Age 10 & Interval 10

	Interval 10
Visited	13.2M
Moved	0
Max RSS	24.7G
Avg RSS	20G

Daniel Owen June 7, 2018 at 1:39 PM

Assigning to to close.

Daniel Owen June 7, 2018 at 1:37 PM

From the above results the RSS can only be significantly reduced when running very aggressively. This is not a suitable default setting.
The high RSS in the original run associated with the MB is believed to be do to a side-effect of numerous tasks running slowly - and on repeated runs we do not see the issue. Therefore will mark the issue as Not reproducible.

However it is worth noting that an improvement (targeted for Mad Hatter) is to consider only running defragmenter when RSS is high, we then can use more aggressive settings, see MB-29928.

Won't Fix

Pinned fields

Click on the next to a field label to start pinning.

Details

Assignee

Pavel Paulau

Reporter

Pavel Paulau

Is this a Regression?

Triage

Untriaged

Operating System

Centos 64-bit

Due date

Jun 08, 2018

Priority

Critical

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created May 29, 2018 at 11:59 PM

Updated June 8, 2018 at 7:48 AM

Resolved June 7, 2018 at 3:36 PM

Configure

Instabug

memcached is OOM-killed during data ingestion due to high heap fragmentation

Description

Components

Affects versions

Fix versions

Labels

Environment

Link to Log File, atop/blg, CBCollectInfo, Core dump

Release Notes Description

Attachments

Linked issues

is triggering

relates to

Activity

Pavel Paulau June 7, 2018 at 3:39 PM

Daniel Owen June 7, 2018 at 3:36 PM

Daniel Owen June 7, 2018 at 3:16 PM

Daniel Owen June 7, 2018 at 1:39 PM

Daniel Owen June 7, 2018 at 1:37 PM

Details

Assignee

Reporter

Is this a Regression?

Triage

Operating System

Due date

Priority

Instabug

PagerDuty

Sentry

Zendesk Support