Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: feature-backlog
Affects Version/s: 3.0.2
Component/s: ns_server
Security Level: Public
Labels:
- beta

Triage:
Triaged
Is this a Regression?:
Unknown

Description

Running a 50 node 3.0.2 cluster in Google Cloud. Cluster was idle overnight and when I logged into the UI , the UI immediately stopped responding and displayed cached information. I logged into the UI on a second node and it was showing the initial server as down. It then itself started displaying cached information. I logged onto t a 3rd node which showed initial 2 nodes as down then started displaying cached information itself.

Watching the couchbase processes on a node prior to login I see this:
UID PID PPID C STIME TTY TIME CMD
999 3088 1 0 Feb02 ? 00:00:02 /opt/couchbase/lib/erlang/erts-5.10.4/bin/epmd -daemon
999 3126 1 0 Feb02 ? 00:00:29 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -kernel inet_d
999 3217 3126 17 Feb02 ? 10:46:00 /opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcached.json
999 3486 3126 0 Feb02 ? 00:00:00 inet_gethost 4
999 3487 3486 0 Feb02 ? 00:00:00 inet_gethost 4
999 9756 3126 7 10:26 ? 00:00:30 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 -sbt u -P 327680 -K true -swt low -MMmcs 30 -e102400 – -root /opt/couchbase/lib/erlang -progname erl
999 9794 9756 0 10:26 ? 00:00:00 sh -s disksup
999 9796 9756 0 10:26 ? 00:00:00 /opt/couchbase/lib/erlang/lib/os_mon-2.2.14/priv/bin/memsup
999 9797 9756 0 10:26 ? 00:00:00 /opt/couchbase/lib/erlang/lib/os_mon-2.2.14/priv/bin/cpu_sup
999 9808 9756 0 10:26 ? 00:00:00 inet_gethost 4
999 9809 9808 0 10:26 ? 00:00:00 inet_gethost 4
999 9813 9756 0 10:26 ? 00:00:00 sh -s ns_disksup
999 9815 9756 0 10:26 ? 00:00:00 /opt/couchbase/lib/ns_server/erlang/lib/ns_server/priv/i386-linux-godu
999 9823 3126 0 10:26 ? 00:00:01 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -P 327680 -K true – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -k
999 9851 9756 0 10:26 ? 00:00:01 portsigar for ns_1@cb-server-12.c.cb-googbench-101.internal
999 9852 3126 0 10:26 ? 00:00:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=

Then, after logging on through the UI, the process list shrinks to this:
999 3088 1 0 Feb02 ? 00:00:02 /opt/couchbase/lib/erlang/erts-5.10.4/bin/epmd -daemon
999 3126 1 0 Feb02 ? 00:00:29 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -kernel inet_d
999 3217 3126 17 Feb02 ? 10:46:00 /opt/couchbase/bin/memcached -C /opt/couchbase/var/lib/couchbase/config/memcached.json
999 3486 3126 0 Feb02 ? 00:00:00 inet_gethost 4
999 3487 3486 0 Feb02 ? 00:00:00 inet_gethost 4
999 9756 3126 12 10:26 ? 00:00:54 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -A 16 -sbt u -P 327680 -K true -swt low -MMmcs 30 -e102400 – -root /opt/couchbase/lib/erlang -progname erl
999 9823 3126 0 10:26 ? 00:00:01 /opt/couchbase/lib/erlang/erts-5.10.4/bin/beam.smp -P 327680 -K true – -root /opt/couchbase/lib/erlang -progname erl – -home /opt/couchbase – -smp enable -k
999 9852 3126 0 10:26 ? 00:00:00 /opt/couchbase/bin/moxi -Z port_listen=11211,default_bucket_name=default,downstream_max=1024,downstream_conn_max=4,connect_max_errors=5,connect_retry_interval=

It recovers after about 30 seconds. Other nodes in the cluster seem unaffected.

Logs for node with failure:
https://s3.amazonaws.com/customers.couchbase.com/davidH/node12.zip
Orchestrator:
https://s3.amazonaws.com/customers.couchbase.com/davidH/node10-orchestrator.zip

ns_server_error.log on the failed node has:
[ns_server:error,2015-02-05T10:18:21.307,ns_1@cb-server-12.c.cb-googbench-101.internal:ns_log<0.277.0>:ns_log:handle_cast:210]unable to notify listeners because of badarg

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Poonam Dhavale

Reporter:: David Haikney (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 05/Feb/15 3:45 AM

Updated:: 10/Mar/17 2:06 AM

Resolved:: 09/Mar/17 6:36 PM

Gerrit Reviews

There are no open Gerrit changes

Logging onto UI takes down beam

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty