Cluster manager continuously crashes when on a undersized system
Description
Components
Labels
Environment
Link to Log File, atop/blg, CBCollectInfo, Core dump
Release Notes Description
Activity
Thuan Nguyen August 3, 2016 at 12:52 AM
Talked to Artem. It's not easy to verify this bug. Just make sure the fix go into 4.1.2 branch.
Aliaksey Artamonau May 5, 2016 at 6:07 PM
, please create a new issue and attach full logs (you can obtain them under Logs/Collect information tab).
zijunyang May 5, 2016 at 8:33 AM
I am using Version: 4.1.0-5005 Enterprise Edition (build-5005) on Ubuntu 12.04.5 LTS (GNU/Linux 3.2.0-90-generic x86_64). and I have the same issue.
My Couchbase Server will auto restart frequently. from the log of admin console says:
Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2016-05-05T11:54:47.790+08:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=3
MetadataService 2016-05-05T11:54:47.790+08:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=4
RemoteClusterService 2016-05-05T11:54:47.791+08:00 [ERROR] Failed to get all entries, err=metakv failed for max number of retries = 5
Error starting remote cluster service. err=metakv failed for max number of retries = 5
[goport] 2016/05/05 11:54:47 /opt/couchbase/bin/goxdcr terminated: exit status 1
Artem Stemkovski April 26, 2016 at 10:08 PM
Here's the corresponding fix in supervisor.erl:
https://github.com/erlang/otp/commit/c59c3a6d57b857913ddfa13f96425ba0d95ccb2d
Details
Assignee
Eric CooperEric CooperReporter
AlfieAlfiePriority
CriticalInstabug
Open Instabug
Details
Details
Assignee
Reporter
Priority
Instabug
PagerDuty
PagerDuty Incident
PagerDuty
PagerDuty Incident
PagerDuty

Sentry
Linked Issues
Sentry
Linked Issues
Sentry
Zendesk Support
Linked Tickets
Zendesk Support
Linked Tickets
Zendesk Support

The server is crashing and failing to recover on a regular basis.
Anil recommended that we post a collect info. The resulting collect.zip file is over 100MB. Could you please download it from: http://homeshome.co.uk/share/dev/collect.zip.
I look forward getting this resolved with your help.