Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-13312

Logging onto UI takes down beam

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • feature-backlog
    • 3.0.2
    • ns_server
    • Security Level: Public
    • Triaged
    • Unknown

    Description

      Running a 50 node 3.0.2 cluster in Google Cloud. Cluster was idle overnight and when I logged into the UI , the UI immediately stopped responding and displayed cached information. I logged into the UI on a second node and it was showing the initial server as down. It then itself started displaying cached information. I logged onto t a 3rd node which showed initial 2 nodes as down then started displaying cached information itself.

      Watching the couchbase processes on a node prior to login I see this:
      UID PID PPID C STIME TTY TIME CMD
      999 3088 1 0 Feb02 ? 00:00:02 /opt/couchbase/lib/erlang/erts-5.10.4/bin/epmd -daemon
      999 3126 1 0 Feb02 ? 00:00:29 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -kernel inet_d
      999 3217 3126 17 Feb02 ? 10:46:00 /opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcached.json
      999 3486 3126 0 Feb02 ? 00:00:00 inet_gethost 4
      999 3487 3486 0 Feb02 ? 00:00:00 inet_gethost 4
      999 9756 3126 7 10:26 ? 00:00:30 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 -sbt u -P 327680 -K true -swt low -MMmcs 30 -e102400 – -root /opt/couchbase/lib/erlang -progname erl
      999 9794 9756 0 10:26 ? 00:00:00 sh -s disksup
      999 9796 9756 0 10:26 ? 00:00:00 /opt/couchbase/lib/erlang/lib/os_mon-2.2.14/priv/bin/memsup
      999 9797 9756 0 10:26 ? 00:00:00 /opt/couchbase/lib/erlang/lib/os_mon-2.2.14/priv/bin/cpu_sup
      999 9808 9756 0 10:26 ? 00:00:00 inet_gethost 4
      999 9809 9808 0 10:26 ? 00:00:00 inet_gethost 4
      999 9813 9756 0 10:26 ? 00:00:00 sh -s ns_disksup
      999 9815 9756 0 10:26 ? 00:00:00 /opt/couchbase/lib/ns_server/erlang/lib/ns_server/priv/i386-linux-godu
      999 9823 3126 0 10:26 ? 00:00:01 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -P 327680 -K true – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -k
      999 9851 9756 0 10:26 ? 00:00:01 portsigar for ns_1@cb-server-12.c.cb-googbench-101.internal
      999 9852 3126 0 10:26 ? 00:00:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=

      Then, after logging on through the UI, the process list shrinks to this:
      999 3088 1 0 Feb02 ? 00:00:02 /opt/couchbase/lib/erlang/erts-5.10.4/bin/epmd -daemon
      999 3126 1 0 Feb02 ? 00:00:29 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -kernel inet_d
      999 3217 3126 17 Feb02 ? 10:46:00 /opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcached.json
      999 3486 3126 0 Feb02 ? 00:00:00 inet_gethost 4
      999 3487 3486 0 Feb02 ? 00:00:00 inet_gethost 4
      999 9756 3126 12 10:26 ? 00:00:54 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 -sbt u -P 327680 -K true -swt low -MMmcs 30 -e102400 – -root /opt/couchbase/lib/erlang -progname erl
      999 9823 3126 0 10:26 ? 00:00:01 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -P 327680 -K true – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -k
      999 9852 3126 0 10:26 ? 00:00:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=

      It recovers after about 30 seconds. Other nodes in the cluster seem unaffected.

      Logs for node with failure:
      https://s3.amazonaws.com/customers.couchbase.com/davidH/node12.zip
      Orchestrator:
      https://s3.amazonaws.com/customers.couchbase.com/davidH/node10-orchestrator.zip

      ns_server_error.log on the failed node has:
      [ns_server:error,2015-02-05T10:18:21.307,ns_1@cb-server-12.c.cb-googbench-101.internal:ns_log<0.277.0>:ns_log:handle_cast:210]unable to notify listeners because of badarg

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            poonam Poonam Dhavale
            dhaikney David Haikney (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty