Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Fixed
Priority: Major
Fix Version/s: feature-backlog
Affects Version/s: 3.0.2
Component/s: ns_server
Security Level: Public
Labels:

Description

Created a cluster of 140 nodes
loaded the 2 sample buckets
logging into a node causes the following beam process to crash (see below). This is repeated as a log into any of the nodes

/opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 -sbt u -P 327680 -K true -swt low -MMmcs 30 -e102400 – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -setcookie nocookie -kernel inet_dist_listen_min 21100 inet_dist_listen_max 21299 error_logger false -sasl sasl_error_logger false -nouser -run child_erlang child_start ns_bootstrap – -smp enable -couch_ini /opt/couchbase/etc/couchdb/default.ini /opt/couchbase/etc/couchdb/default.d/capi.ini /opt/couchbase/etc/couchdb/default.d/geocouch.ini /opt/couchbase/etc/couchdb/local.ini

Looking at the crash dump the reason is unable to allocate required memory (~1.3GB)
Slogan: eheap_alloc: Cannot allocate 1318267840 bytes of memory (of type "heap").

Looking through the erlang processes all but 2 are in a waiting state
One is scheduled - Current call: menelaus_stats:aggregate_stat_kv_pairs/3
One is Garbing. - Spawned as: erlang:apply/2

I then looked at erlang process memory usage, and 3 process use significantly more memory than the others these are

1. Current call: menelaus_stats:aggregate_stat_kv_pairs/3 - 1581922360 (1.5GB)
2. Name: menelaus_stats_gatherer - 1318276864 (1.3GB)
3. State: Garbing, Spawned as: erlang:apply/2 -
Program counter: 0x00007f377f428d40 (gen_server:rec_nodes/7 + 224)
1201043352. (1.2GB) Message queue length: 138
Link list: [{to,

{'stats_reader-beer-sample','ns_1@ec2-52-16-163-26.eu-west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189696>}}, {to,

{'stats_re ader-beer-sample','ns_1@ec2-52-16-163-30.eu-west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189698>}}, {to,

{'stats_reader-beer-sample','ns_1@ec 2-52-16-163-29.eu-west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189697>}}, {to,

{'stats_reader-beer-sample','ns_1@ec2-52-16-163-32.eu-west-1.c ompute.amazonaws.com'}

,#Ref<0.0.11.189700>}}, {to,

{'stats_reader-beer-sample','ns_1@ec2-52-16-163-42.eu-west-1.compute.amazonaws.com'}

,#Re
f<0.0.11.189702>}}, {to,

{'stats_reader-beer-sample','ns_1@ec2-52-16-163-36.eu-west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189701>}}, {to,

{' stats_reader-beer-sample','ns_1@ec2-52-16-163-31.eu-west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189699>}}, {to,

{'stats_reader-beer-sample', 'ns_1@ec2-52-16-163-45.eu-west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189704>}}, {to,

{'stats_reader-beer-sample','ns_1@ec2-52-16-163-48.eu- west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189706>}}, {to,

{'stats_reader-beer-sample','ns_1@ec2-52-16-163-47.eu-west-1.compute.amazonaws.c om'}

,#Ref<0.0.11.189705>}}, {to,

{'stats_reader-beer-sample','ns_1@ec2-52-16-163-52.eu-west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189708>}}
, {to,

{'stats_reader-beer-sample','ns_1@ec2-52-16-163-56.eu-west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189710>}}, {to,

{'stats_reader-beer- sample','ns_1@ec2-52-16-163-55.eu-west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189709>}}, {to,

{'stats_reader-beer-sample','ns_1@ec2-52-16-16 3-49.eu-west-1.compute.amazonaws.com'}

,#Ref<0.0.11.189707>}}, {to,

{'stats_reader-beer-sample','ns_1@ec2-52-16-163-43.eu-west-1.compute.ama zonaws.com'}

,#Ref<0.0.11.189703>}}, ....this continues for all the nodes in the cluster.

The erlang:apply/2 is spawn by <0.1577.0>, which was found to be Name: menelaus_stats_gatherer.

So my conclusion is that the stats gathering is causing the significant memory usage, and I think causes beam.smp to crash due to running out of memory.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Daniel Owen

Reporter:: Daniel Owen

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 04/Mar/15 2:57 PM

Updated:: 20/Apr/16 8:42 AM

Resolved:: 20/Apr/16 8:42 AM

Gerrit Reviews

There are no open Gerrit changes

Beam process crashes when log into UI on node (cluster size 140)

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty