Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35155

Couchbase demon failed to start crashing jepsen

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • master
    • master
    • Jepsen
    • None
    • Untriaged
    • Unknown

    Description

      The couchbase demon failed to start during set up. We should handle this case better, if the demon fails to start we should make sure we perform teardown so that that machine is in a clean state. We should also log which node failed to start to help with debugging.

      Stack trace of crash (from kv-engine-jepsen-post-commit-145):

      2019-07-17 23:51:37,372{GMT}	WARN	[main] jepsen.core: Test crashed!
      java.lang.Exception: daemon failed to start
      	at couchbase.util$wait_for_daemon$fn__2787.invoke(util.clj:355) ~[na:na]
      	at couchbase.util$wait_for_daemon.invokeStatic(util.clj:351) ~[na:na]
      	at couchbase.util$wait_for_daemon.invoke(util.clj:345) ~[na:na]
      	at couchbase.util$setup_node.invokeStatic(util.clj:399) ~[na:na]
      	at couchbase.util$setup_node.invoke(util.clj:374) ~[na:na]
      	at couchbase.core$couchbase$reify__4628.setup_BANG_(core.clj:21) ~[na:na]
      	at jepsen.db$fn__2954$G__2933__2958.invoke(db.clj:8) ~[jepsen-0.1.14.jar:na]
      	at jepsen.db$fn__2954$G__2932__2963.invoke(db.clj:8) ~[jepsen-0.1.14.jar:na]
      	at clojure.core$partial$fn__5839.invoke(core.clj:2625) ~[clojure-1.10.1.jar:na]
      	at jepsen.control$on_nodes$fn__2918.invoke(control.clj:391) ~[jepsen-0.1.14.jar:na]
      	at clojure.lang.AFn.applyToHelper(AFn.java:154) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
      	at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
      	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
      	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.RestFn.applyTo(RestFn.java:142) ~[clojure-1.10.1.jar:na]
      	at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
      	at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.10.1.jar:na]
      	at dom_top.core$real_pmap_helper$build_thread__214$fn__215.invoke(core.clj:146) ~[jepsen-0.1.14.jar:na]
      	at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
      	at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
      	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
      	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.10.1.jar:na]
      	at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
      	at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.10.1.jar:na]
      	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_212]
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          All of the nodes memached.log are empty implying non of the memcached process had come up at the point. Will continue to investigate.

          richard.demellow Richard deMellow added a comment - All of the nodes memached.log are empty implying non of the memcached process had come up at the point. Will continue to investigate.

          Looking at the couchbase.log, it seems that we have installed the files on the nodes, but have been unable to start couchbase server. Not sure if its crashed or if the server start has failed.

          richard.demellow Richard deMellow added a comment - Looking at the couchbase.log, it seems that we have installed the files on the nodes, but have been unable to start couchbase server. Not sure if its crashed or if the server start has failed.

          I believe this crash only occurs when using a locally build version of Couchbase-Server and not when using a .deb, testing to confirm this.

          richard.demellow Richard deMellow added a comment - I believe this crash only occurs when using a locally build version of Couchbase-Server and not when using a .deb, testing to confirm this.

          People

            richard.demellow Richard deMellow
            richard.demellow Richard deMellow
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty