Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-13766

Beam process crashes when log into UI on node (cluster size 140)

    XMLWordPrintable

Details

    Description

      Created a cluster of 140 nodes
      loaded the 2 sample buckets
      logging into a node causes the following beam process to crash (see below). This is repeated as a log into any of the nodes

      /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 -sbt u -P 327680 -K true -swt low -MMmcs 30 -e102400 – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -setcookie nocookie -kernel inet_dist_listen_min 21100 inet_dist_listen_max 21299 error_logger false -sasl sasl_error_logger false -nouser -run child_erlang child_start ns_bootstrap – -smp enable -couch_ini /opt/couchbase/etc/couchdb/default.ini /opt/couchbase/etc/couchdb/default.d/capi.ini /opt/couchbase/etc/couchdb/default.d/geocouch.ini /opt/couchbase/etc/couchdb/local.ini

      Looking at the crash dump the reason is unable to allocate required memory (~1.3GB)
      Slogan: eheap_alloc: Cannot allocate 1318267840 bytes of memory (of type "heap").

      Looking through the erlang processes all but 2 are in a waiting state
      One is scheduled - Current call: menelaus_stats:aggregate_stat_kv_pairs/3
      One is Garbing. - Spawned as: erlang:apply/2

      I then looked at erlang process memory usage, and 3 process use significantly more memory than the others these are

      1. Current call: menelaus_stats:aggregate_stat_kv_pairs/3 - 1581922360 (1.5GB)
      2. Name: menelaus_stats_gatherer - 1318276864 (1.3GB)
      3. State: Garbing, Spawned as: erlang:apply/2 -
      Program counter: 0x00007f377f428d40 (gen_server:rec_nodes/7 + 224)
      1201043352. (1.2GB) Message queue length: 138
      Link list: [{to,

      {'stats_reader-beer-sample','ns_1@ec2-52-16-163-26.eu-west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189696>}}, {to,

      {'stats_re ader-beer-sample','ns_1@ec2-52-16-163-30.eu-west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189698>}}, {to,

      {'stats_reader-beer-sample','ns_1@ec 2-52-16-163-29.eu-west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189697>}}, {to,

      {'stats_reader-beer-sample','ns_1@ec2-52-16-163-32.eu-west-1.c ompute.amazonaws.com'}

      ,#Ref<0.0.11.189700>}}, {to,

      {'stats_reader-beer-sample','ns_1@ec2-52-16-163-42.eu-west-1.compute.amazonaws.com'}

      ,#Re
      f<0.0.11.189702>}}, {to,

      {'stats_reader-beer-sample','ns_1@ec2-52-16-163-36.eu-west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189701>}}, {to,

      {' stats_reader-beer-sample','ns_1@ec2-52-16-163-31.eu-west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189699>}}, {to,

      {'stats_reader-beer-sample', 'ns_1@ec2-52-16-163-45.eu-west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189704>}}, {to,

      {'stats_reader-beer-sample','ns_1@ec2-52-16-163-48.eu- west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189706>}}, {to,

      {'stats_reader-beer-sample','ns_1@ec2-52-16-163-47.eu-west-1.compute.amazonaws.c om'}

      ,#Ref<0.0.11.189705>}}, {to,

      {'stats_reader-beer-sample','ns_1@ec2-52-16-163-52.eu-west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189708>}}
      , {to,

      {'stats_reader-beer-sample','ns_1@ec2-52-16-163-56.eu-west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189710>}}, {to,

      {'stats_reader-beer- sample','ns_1@ec2-52-16-163-55.eu-west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189709>}}, {to,

      {'stats_reader-beer-sample','ns_1@ec2-52-16-16 3-49.eu-west-1.compute.amazonaws.com'}

      ,#Ref<0.0.11.189707>}}, {to,

      {'stats_reader-beer-sample','ns_1@ec2-52-16-163-43.eu-west-1.compute.ama zonaws.com'}

      ,#Ref<0.0.11.189703>}}, ....this continues for all the nodes in the cluster.

      The erlang:apply/2 is spawn by <0.1577.0>, which was found to be Name: menelaus_stats_gatherer.

      So my conclusion is that the stats gathering is causing the significant memory usage, and I think causes beam.smp to crash due to running out of memory.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            owend Daniel Owen
            owend Daniel Owen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty