Description
What's the problem?
Looks like we're hitting MB-45289 during the cluster setup in which the ''/controller/addNode" end-point to returns a http status code of 500 on occasion with the error message "Unexpected server error, request logged".
What's the fix?
Given that the comment in the MB suggests that we retry on this error, I think it's best to simply add a retry-with-exp-backoff around the "controller/addNode" http request.
Side notes
Doesn't seem to affect CC tests.
Appendix
An extract from the jepsen.log which indicates the http request failed with a http status of 500.
jepsen.log |
2021-08-13 15:34:19,463{GMT} INFO [jepsen node 172.28.128.183] couchbase.util: Adding node 172.28.128.184 to cluster2021-08-13 15:34:19,463{GMT} INFO [jepsen node 172.28.128.183] couchbase.util: Adding node 172.28.128.184 to cluster2021-08-13 15:34:19,561{GMT} WARN [jepsen node 172.28.128.183] couchbase.util: Rest call to http://172.28.128.183:8091/controller/addNode with params {:hostname http://172.28.128.184, :user Administrator, :password abc123, :services kv} threw exception.2021-08-13 15:34:19,566{GMT} INFO [jepsen node 172.28.128.183] couchbase.util: #error { :cause clj-http: status 500 {:cached nil, :request-time 86, :repeatable? false, :protocol-version {:name "HTTP", :major 1, :minor 1}, :streaming? true, :http-client #object[org.apache.http.impl.client.InternalHttpClient 0x400fe1ef "org.apache.http.impl.client.InternalHttpClient@400fe1ef"], :chunked? false, :reason-phrase "Internal Server Error", :headers {"X-Permitted-Cross-Domain-Policies" "none", "Server" "Couchbase Server", "Content-Type" "application/json", "X-Content-Type-Options" "nosniff", "Content-Length" "44", "X-Frame-Options" "DENY", "Connection" "close", "Pragma" "no-cache", "Expires" "Thu, 01 Jan 1970 00:00:00 GMT", "Date" "Fri, 13 Aug 2021 14:34:17 GMT", "X-XSS-Protection" "1; mode=block", "Cache-Control" "no-cache,no-store,must-revalidate"}, :orig-content-encoding nil, :status 500, :length 44, :body "[\"Unexpected server error, request logged.\"]", :trace-redirects []} :data {:cached nil, :request-time 86, :repeatable? false, :protocol-version {:name HTTP, :major 1, :minor 1}, :streaming? true, :http-client #object[org.apache.http.impl.client.InternalHttpClient 0x400fe1ef org.apache.http.impl.client.InternalHttpClient@400fe1ef], :chunked? false, :type :clj-http.client/unexceptional-status, :reason-phrase Internal Server Error, :headers {X-Permitted-Cross-Domain-Policies none, Server Couchbase Server, Content-Type application/json, X-Content-Type-Options nosniff, Content-Length 44, X-Frame-Options DENY, Connection close, Pragma no-cache, Expires Thu, 01 Jan 1970 00:00:00 GMT, Date Fri, 13 Aug 2021 14:34:17 GMT, X-XSS-Protection 1; mode=block, Cache-Control no-cache,no-store,must-revalidate}, :orig-content-encoding nil, :status 500, :length 44, :body ["Unexpected server error, request logged."], :trace-redirects []}
|
A stack trace from the ns_server.error.log, although it's not identical but it seems to be related to node renaming.
ns_server.error.log(node:172.28.128.183) |
[ns_server:error,2021-08-13T14:34:18.414Z,ns_1@172.28.128.183:<0.814.0>:menelaus_util:reply_server_error:205]Server error during processing: ["web request failed",
|
{path,"/controller/addNode"},
|
{method,'POST'},
|
{type,exit},
|
{what,
|
{{{{{badmatch,
|
{error,
|
{conflict,
|
{<<"6785d52df1cd6dd18a640f46c3233394">>,
|
8}}}},
|
[{chronicle_local,handle_rename,1,
|
[{file,"src/chronicle_local.erl"},
|
{line,152}]},
|
{chronicle_local,handle_call,3,
|
[{file,"src/chronicle_local.erl"},
|
{line,96}]},
|
{gen_server2,handle_call,3,
|
[{file,"src/gen_server2.erl"},
|
{line,214}]},
|
{gen_server,try_handle_call,4,
|
[{file,"gen_server.erl"},{line,661}]},
|
{gen_server,handle_msg,6,
|
[{file,"gen_server.erl"},{line,690}]},
|
{proc_lib,init_p_do_apply,3,
|
[{file,"proc_lib.erl"},{line,249}]}]},
|
{gen_server,call,
|
[chronicle_local,
|
{rename,'ns_1@cb.local'}]}},
|
{gen_server,call,
|
[dist_manager,
|
{adjust_my_address,"172.28.128.183",
|
false,#Fun<ns_cluster.7.111409773>},
|
infinity]}},
|
{gen_server,call,
|
[ns_cluster,
|
{add_node_to_group,http,
|
"172.28.128.184",8091,
|
{"Administrator","abc123"},
|
undefined,
|
[kv]},
|
240000]}}},
|
{trace,
|
[{gen_server,call,3,
|
[{file,"gen_server.erl"},{line,223}]},
|
{ns_cluster,add_node_to_group,6,
|
[{file,"src/ns_cluster.erl"},{line,80}]},
|
{menelaus_web_cluster,do_handle_add_node,
|
2,
|
[{file,"src/menelaus_web_cluster.erl"},
|
{line,645}]},
|
{request_throttler,do_request,3,
|
[{file,"src/request_throttler.erl"},
|
{line,58}]},
|
{menelaus_util,handle_request,2,
|
[{file,"src/menelaus_util.erl"},
|
{line,216}]},
|
{mochiweb_http,headers,6,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
|
{line,150}]},
|
{proc_lib,init_p_do_apply,3,
|
[{file,"proc_lib.erl"},{line,249}]}]}]
|
Attachments
Issue Links
- is caused by
-
MB-45289 [System Test] engageCluster2 POST returns status 500 - Avoid node rename
- Closed
For Gerrit Dashboard: MB-47937 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
159380,5 | MB-47937: Add a retry around /controller/addNode | master | jepsen.couchbase | Status: MERGED | +2 | +1 |