Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-16766

Cluster manager continuously crashes when on a undersized system

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 4.0.0, 4.5.0
    • Fix Version/s: 4.1.2, 4.5.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:
      Ubuntu AWS Instance

      Description

      The server is crashing and failing to recover on a regular basis.

      Anil recommended that we post a collect info. The resulting collect.zip file is over 100MB. Could you please download it from: http://homeshome.co.uk/share/dev/collect.zip.

      I look forward getting this resolved with your help.

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          artem Artem Stemkovski added a comment -
          Show
          artem Artem Stemkovski added a comment - Here's the corresponding fix in supervisor.erl: https://github.com/erlang/otp/commit/c59c3a6d57b857913ddfa13f96425ba0d95ccb2d
          Show
          Aliaksey Artamonau Aliaksey Artamonau added a comment - http://review.couchbase.org/#/c/63396/
          Hide
          atom.yang atom.yang added a comment -

          I am using Version: 4.1.0-5005 Enterprise Edition (build-5005) on Ubuntu 12.04.5 LTS (GNU/Linux 3.2.0-90-generic x86_64). and I have the same issue.
          My Couchbase Server will auto restart frequently. from the log of admin console says:

          Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2016-05-05T11:54:47.790+08:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=3
          MetadataService 2016-05-05T11:54:47.790+08:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=4
          RemoteClusterService 2016-05-05T11:54:47.791+08:00 [ERROR] Failed to get all entries, err=metakv failed for max number of retries = 5
          Error starting remote cluster service. err=metakv failed for max number of retries = 5
          [goport] 2016/05/05 11:54:47 /opt/couchbase/bin/goxdcr terminated: exit status 1

          Show
          atom.yang atom.yang added a comment - I am using Version: 4.1.0-5005 Enterprise Edition (build-5005) on Ubuntu 12.04.5 LTS (GNU/Linux 3.2.0-90-generic x86_64). and I have the same issue. My Couchbase Server will auto restart frequently. from the log of admin console says: Service 'goxdcr' exited with status 1. Restarting. Messages: MetadataService 2016-05-05T11:54:47.790+08:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=3 MetadataService 2016-05-05T11:54:47.790+08:00 [ERROR] metakv.ListAllChildren failed. path=/remoteCluster/, err=Get http://127.0.0.1:8091/_metakv/remoteCluster/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connection refused, num_of_retry=4 RemoteClusterService 2016-05-05T11:54:47.791+08:00 [ERROR] Failed to get all entries, err=metakv failed for max number of retries = 5 Error starting remote cluster service. err=metakv failed for max number of retries = 5 [goport] 2016/05/05 11:54:47 /opt/couchbase/bin/goxdcr terminated: exit status 1
          Hide
          Aliaksey Artamonau Aliaksey Artamonau added a comment -

          atom.yang, please create a new issue and attach full logs (you can obtain them under Logs/Collect information tab).

          Show
          Aliaksey Artamonau Aliaksey Artamonau added a comment - atom.yang , please create a new issue and attach full logs (you can obtain them under Logs/Collect information tab).
          Hide
          thuan Thuan Nguyen added a comment -

          Talked to Artem. It's not easy to verify this bug. Just make sure the fix go into 4.1.2 branch.

          Show
          thuan Thuan Nguyen added a comment - Talked to Artem. It's not easy to verify this bug. Just make sure the fix go into 4.1.2 branch.

            People

            • Assignee:
              ericcooper Eric Cooper (Inactive)
              Reporter:
              Alfie Alfie
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty

                  Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.