Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35636

Jepsen crash in post-commit validation

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • master
    • master
    • Jepsen
    • None
    • Untriaged
    • Unknown

    Description

      One of the tests in Jepsen post-commit job http://cv.jenkins.couchbase.com/job/jepsen-post-commit/28/ crashed during setup. This was not due to an issue with the new commit, but appears to be due to an intermittent issue in our Jepsen setup code.

      2019-08-21 07:52:06,462{GMT}	WARN	[main] jepsen.core: Test crashed!
      clojure.lang.ExceptionInfo: clj-http: status 400 {:cached nil, :request-time 12759, :repeatable? false, :protocol-version {:name "HTTP", :major 1, :minor 1}, :streaming? true, :http-client #object[org.apache.http.impl.client.InternalHttpClient 0x260b6a41 "org.apache.http.impl.client.InternalHttpClient@260b6a41"], :chunked? false, :reason-phrase "Bad Request", :headers {"X-Permitted-Cross-Domain-Policies" "none", "Server" "Couchbase Server", "Content-Type" "application/json", "X-Content-Type-Options" "nosniff", "Content-Length" "181", "X-Frame-Options" "DENY", "Pragma" "no-cache", "Expires" "Thu, 01 Jan 1970 00:00:00 GMT", "Date" "Wed, 21 Aug 2019 14:50:02 GMT", "X-XSS-Protection" "1; mode=block", "Cache-Control" "no-cache,no-store,must-revalidate"}, :orig-content-encoding nil, :status 400, :length 181, :body "[\"Join completion call failed. Got HTTP status 500 from REST call post to http://172.28.128.198:8091/completeJoin. Body was: \\\"[\\\\\\\"Unexpected server error, request logged.\\\\\\\"]\\\"\"]", :trace-redirects []}
      	at slingshot.support$stack_trace.invoke(support.clj:201) ~[knossos-0.3.4.jar:na]
      	at clj_http.client$exceptions_response.invokeStatic(client.clj:244) ~[na:na]
      	at clj_http.client$exceptions_response.invoke(client.clj:236) ~[na:na]
      	at clj_http.client$wrap_exceptions$fn__1978.invoke(client.clj:254) ~[na:na]
      	at clj_http.client$wrap_accept$fn__2224.invoke(client.clj:737) ~[na:na]
      	at clj_http.client$wrap_accept_encoding$fn__2231.invoke(client.clj:759) ~[na:na]
      	at clj_http.client$wrap_content_type$fn__2218.invoke(client.clj:720) ~[na:na]
      	at clj_http.client$wrap_form_params$fn__2327.invoke(client.clj:961) ~[na:na]
      	at clj_http.client$wrap_nested_params$fn__2348.invoke(client.clj:995) ~[na:na]
      	at clj_http.client$wrap_flatten_nested_params$fn__2357.invoke(client.clj:1019) ~[na:na]
      	at clj_http.client$wrap_method$fn__2285.invoke(client.clj:895) ~[na:na]
      	at clj_http.cookies$wrap_cookies$fn__413.invoke(cookies.clj:131) ~[na:na]
      	at clj_http.links$wrap_links$fn__899.invoke(links.clj:63) ~[na:na]
      	at clj_http.client$wrap_unknown_host$fn__2365.invoke(client.clj:1048) ~[na:na]
      	at clj_http.client$request_STAR_.invokeStatic(client.clj:1176) ~[na:na]
      	at clj_http.client$request_STAR_.invoke(client.clj:1169) ~[na:na]
      	at clj_http.client$post.invokeStatic(client.clj:1194) ~[na:na]
      	at clj_http.client$post.doInvoke(client.clj:1190) ~[na:na]
      	at clojure.lang.RestFn.invoke(RestFn.java:423) ~[clojure-1.10.1.jar:na]
      	at couchbase.util$rest_call.invokeStatic(util.clj:69) ~[na:na]
      	at couchbase.util$rest_call.invoke(util.clj:58) ~[na:na]
      	at couchbase.util$rest_call.invokeStatic(util.clj:60) ~[na:na]
      	at couchbase.util$rest_call.invoke(util.clj:58) ~[na:na]
      	at couchbase.util$add_nodes.invokeStatic(util.clj:129) ~[na:na]
      	at couchbase.util$add_nodes.invoke(util.clj:119) ~[na:na]
      	at couchbase.util$add_nodes.invokeStatic(util.clj:121) ~[na:na]
      	at couchbase.util$add_nodes.invoke(util.clj:119) ~[na:na]
      	at couchbase.util$setup_cluster.invokeStatic(util.clj:356) ~[na:na]
      	at couchbase.util$setup_cluster.invoke(util.clj:344) ~[na:na]
      	at couchbase.core$couchbase_remote$reify__4688.setup_primary_BANG_(core.clj:27) ~[na:na]
      	at jepsen.db$fn__2974$G__2970__2978.invoke(db.clj:12) ~[jepsen-0.1.14.jar:na]
      	at jepsen.db$fn__2974$G__2969__2983.invoke(db.clj:12) ~[jepsen-0.1.14.jar:na]
      	at clojure.core$partial$fn__5839.invoke(core.clj:2625) ~[clojure-1.10.1.jar:na]
      	at jepsen.control$on_nodes$fn__2918.invoke(control.clj:391) ~[jepsen-0.1.14.jar:na]
      	at clojure.lang.AFn.applyToHelper(AFn.java:154) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
      	at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
      	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
      	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.RestFn.applyTo(RestFn.java:142) ~[clojure-1.10.1.jar:na]
      	at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
      	at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.10.1.jar:na]
      	at dom_top.core$real_pmap_helper$build_thread__214$fn__215.invoke(core.clj:146) ~[jepsen-0.1.14.jar:na]
      	at clojure.lang.AFn.applyToHelper(AFn.java:152) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.AFn.applyTo(AFn.java:144) ~[clojure-1.10.1.jar:na]
      	at clojure.core$apply.invokeStatic(core.clj:665) ~[clojure-1.10.1.jar:na]
      	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1973) ~[clojure-1.10.1.jar:na]
      	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1973) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.RestFn.invoke(RestFn.java:425) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.AFn.applyToHelper(AFn.java:156) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.RestFn.applyTo(RestFn.java:132) ~[clojure-1.10.1.jar:na]
      	at clojure.core$apply.invokeStatic(core.clj:669) ~[clojure-1.10.1.jar:na]
      	at clojure.core$bound_fn_STAR_$fn__5749.doInvoke(core.clj:2003) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.10.1.jar:na]
      	at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.10.1.jar:na]
      	at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_222]
      

      This appears similar to MB-35440, but note that here the error was encountered on the /completeJoin endpoint rather than /engageCluster2 endpoint which returned an error in MB-35440.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            An initial look suggest this is not a dupe of MB-35440, the babysitter errors observed there are not observed here.

            Instead, we see the following memcached warning repeated multiple times. This warning, however, appears to also have been occurring intermittently for several tests that executed successfully, so this still requires further investigation.

            2019-08-21T07:50:12.459794-07:00 WARNING 43: Invalid password specified for [<ud>@ns_server</ud>] UUID:[6d75e3d2-a847-4e58-5a11-20de9092c5b2]
            2019-08-21T07:50:13.462241-07:00 WARNING 43: Invalid password specified for [<ud>@ns_server</ud>] UUID:[50ccee0a-6bd5-4b72-506c-d98dfec2cfaa]
            2019-08-21T07:50:14.465618-07:00 WARNING 43: Invalid password specified for [<ud>@ns_server</ud>] UUID:[016c579d-96cc-4ec1-68d6-1c4015998bcf]
            2019-08-21T07:50:15.469100-07:00 WARNING 43: Invalid password specified for [<ud>@ns_server</ud>] UUID:[6fe38219-98d5-424c-7641-77596229a027]
            2019-08-21T07:50:16.472579-07:00 WARNING 43: Invalid password specified for [<ud>@ns_server</ud>] UUID:[9906e461-4118-4e53-1cc9-5ad4dfb37cd5]
            

            sven.signer Sven Signer (Inactive) added a comment - An initial look suggest this is not a dupe of MB-35440 , the babysitter errors observed there are not observed here. Instead, we see the following memcached warning repeated multiple times. This warning, however, appears to also have been occurring intermittently for several tests that executed successfully, so this still requires further investigation. 2019-08-21T07:50:12.459794-07:00 WARNING 43: Invalid password specified for [<ud>@ns_server</ud>] UUID:[6d75e3d2-a847-4e58-5a11-20de9092c5b2] 2019-08-21T07:50:13.462241-07:00 WARNING 43: Invalid password specified for [<ud>@ns_server</ud>] UUID:[50ccee0a-6bd5-4b72-506c-d98dfec2cfaa] 2019-08-21T07:50:14.465618-07:00 WARNING 43: Invalid password specified for [<ud>@ns_server</ud>] UUID:[016c579d-96cc-4ec1-68d6-1c4015998bcf] 2019-08-21T07:50:15.469100-07:00 WARNING 43: Invalid password specified for [<ud>@ns_server</ud>] UUID:[6fe38219-98d5-424c-7641-77596229a027] 2019-08-21T07:50:16.472579-07:00 WARNING 43: Invalid password specified for [<ud>@ns_server</ud>] UUID:[9906e461-4118-4e53-1cc9-5ad4dfb37cd5]

            Resolving MB as we've moved Jepsen and couchbase server version and are no longer seeing this crash.

            richard.demellow Richard deMellow added a comment - Resolving MB as we've moved Jepsen and couchbase server version and are no longer seeing this crash.

            People

              richard.demellow Richard deMellow
              sven.signer Sven Signer (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty