Details
-
Improvement
-
Resolution: Fixed
-
Major
-
3.0.2
-
Security Level: Public
Description
Created a cluster of 140 nodes
loaded the 2 sample buckets
logging into a node causes the following beam process to crash (see below). This is repeated as a log into any of the nodes
/opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 -sbt u -P 327680 -K true -swt low -MMmcs 30 -e102400 – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -setcookie nocookie -kernel inet_dist_listen_min 21100 inet_dist_listen_max 21299 error_logger false -sasl sasl_error_logger false -nouser -run child_erlang child_start ns_bootstrap – -smp enable -couch_ini /opt/couchbase/etc/couchdb/default.ini /opt/couchbase/etc/couchdb/default.d/capi.ini /opt/couchbase/etc/couchdb/default.d/geocouch.ini /opt/couchbase/etc/couchdb/local.ini
Looking at the crash dump the reason is unable to allocate required memory (~1.3GB)
Slogan: eheap_alloc: Cannot allocate 1318267840 bytes of memory (of type "heap").
Looking through the erlang processes all but 2 are in a waiting state
One is scheduled - Current call: menelaus_stats:aggregate_stat_kv_pairs/3
One is Garbing. - Spawned as: erlang:apply/2
I then looked at erlang process memory usage, and 3 process use significantly more memory than the others these are
1. Current call: menelaus_stats:aggregate_stat_kv_pairs/3 - 1581922360 (1.5GB)
2. Name: menelaus_stats_gatherer - 1318276864 (1.3GB)
3. State: Garbing, Spawned as: erlang:apply/2 -
Program counter: 0x00007f377f428d40 (gen_server:rec_nodes/7 + 224)
1201043352. (1.2GB) Message queue length: 138
Link list: [{to,
,#Ref<0.0.11.189696>}}, {to,
{'stats_re ader-beer-sample','ns_1@ec2-52-16-163-30.eu-west-1.compute.amazonaws.com'},#Ref<0.0.11.189698>}}, {to,
{'stats_reader-beer-sample','ns_1@ec 2-52-16-163-29.eu-west-1.compute.amazonaws.com'},#Ref<0.0.11.189697>}}, {to,
{'stats_reader-beer-sample','ns_1@ec2-52-16-163-32.eu-west-1.c ompute.amazonaws.com'},#Ref<0.0.11.189700>}}, {to,
{'stats_reader-beer-sample','ns_1@ec2-52-16-163-42.eu-west-1.compute.amazonaws.com'},#Re
f<0.0.11.189702>}}, {to,
,#Ref<0.0.11.189701>}}, {to,
{' stats_reader-beer-sample','ns_1@ec2-52-16-163-31.eu-west-1.compute.amazonaws.com'},#Ref<0.0.11.189699>}}, {to,
{'stats_reader-beer-sample', 'ns_1@ec2-52-16-163-45.eu-west-1.compute.amazonaws.com'},#Ref<0.0.11.189704>}}, {to,
{'stats_reader-beer-sample','ns_1@ec2-52-16-163-48.eu- west-1.compute.amazonaws.com'},#Ref<0.0.11.189706>}}, {to,
{'stats_reader-beer-sample','ns_1@ec2-52-16-163-47.eu-west-1.compute.amazonaws.c om'},#Ref<0.0.11.189705>}}, {to,
{'stats_reader-beer-sample','ns_1@ec2-52-16-163-52.eu-west-1.compute.amazonaws.com'},#Ref<0.0.11.189708>}}
, {to,
,#Ref<0.0.11.189710>}}, {to,
{'stats_reader-beer- sample','ns_1@ec2-52-16-163-55.eu-west-1.compute.amazonaws.com'},#Ref<0.0.11.189709>}}, {to,
{'stats_reader-beer-sample','ns_1@ec2-52-16-16 3-49.eu-west-1.compute.amazonaws.com'},#Ref<0.0.11.189707>}}, {to,
{'stats_reader-beer-sample','ns_1@ec2-52-16-163-43.eu-west-1.compute.ama zonaws.com'},#Ref<0.0.11.189703>}}, ....this continues for all the nodes in the cluster.
The erlang:apply/2 is spawn by <0.1577.0>, which was found to be Name: menelaus_stats_gatherer.
So my conclusion is that the stats gathering is causing the significant memory usage, and I think causes beam.smp to crash due to running out of memory.