Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
7.1.0
-
7.1.0-2440
-
Untriaged
-
Centos 64-bit
-
1
-
Unknown
-
KV 2022-Feb, KV March-22
Description
Script to Repro
There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable.
|
Logs before node became unusable.
2022-03-06 13:49:41,844 | test | INFO | MainThread | [basetestcase:log_setup_status:647] ========= BaseTestCase setup started for test #5 test_data_load_collections_with_graceful_failover_rebalance_out =========
|
2022-03-06 13:50:22,311 | test | INFO | MainThread | [rest_client:monitorRebalance:1610] Rebalance done. Taken 11.0540001392 seconds to complete
|
2022-03-06 13:50:22,319 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 5 seconds. Reason: Wait after rebalance complete
|
2022-03-06 13:50:27,359 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:50:27,362 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:50:32,378 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:50:32,380 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:50:37,394 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:50:37,395 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:50:42,404 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
|
2022-03-06 13:50:42,404 | test | ERROR | MainThread | [cluster_ready_functions:cleanup_cluster:232] Can't create rest connection after rebalance out for ejected nodes, will retry after 10 seconds according to MB-8430: Unable to reach the host @ 172.23.100.35
|
2022-03-06 13:50:42,411 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 10 seconds. Reason: MB-8430
|
2022-03-06 13:50:52,420 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:50:52,421 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:50:57,436 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:50:57,437 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:51:02,453 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:51:02,456 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:51:07,463 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
|
Traceback (most recent call last):
|
File "pytests/basetestcase.py", line 363, in setUp
|
self.cluster_util.cluster_cleanup(cluster,
|
File "pytests/basetestcase.py", line 363, in setUp
|
self.cluster_util.cluster_cleanup(cluster,
|
File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 169, in cluster_cleanup
|
self.cleanup_cluster(cluster, master=cluster.master)
|
File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 237, in cleanup_cluster
|
rest = RestConnection(removed)
|
File "lib/membase/api/rest_client.py", line 319, in __init__
|
raise ServerUnavailableException(self.ip)
|
ServerUnavailableException: Unable to reach the host @ 172.23.100.35
|
172.23.100.35
[ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
|
[ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
|
[{ets,lookup,
|
[ns_server_stats,{c,{<<"rest_request_enters">>,[]}}],
|
[{error_info,#{cause => id,module => erl_stdlib_errors}}]},
|
{ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2,
|
[{file,"src/ns_server_stats.erl"},{line,257}]},
|
{lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
|
{ns_server_stats,'-report_prom_stats/2-fun-0-',2,
|
[{file,"src/ns_server_stats.erl"},{line,170}]},
|
{ns_server_stats,report_prom_stats,2,
|
[{file,"src/ns_server_stats.erl"},{line,180}]},
|
{async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
|
[ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc,
|
{gen_server,call,
|
[ns_server_stats,get_stats]}}
|
[{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},
|
{ns_server_stats,report_system_stats,1,
|
[{file,"src/ns_server_stats.erl"},{line,188}]},
|
{ns_server_stats,'-report_prom_stats/2-fun-0-',2,
|
[{file,"src/ns_server_stats.erl"},{line,170}]},
|
{ns_server_stats,report_prom_stats,2,
|
[{file,"src/ns_server_stats.erl"},{line,182}]},
|
{async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
|
[ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
|
[ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
|
[{ets,safe_fixtable,
|
[ns_server_stats,true],
|
[{error_info,#{cause => id,module => erl_stdlib_errors}}]},
|
{ets,foldl,3,[{file,"ets.erl"},{line,625}]},
|
{ns_server_stats,report_ns_server_hc_stats,1,
|
[{file,"src/ns_server_stats.erl"},{line,264}]},
|
{ns_server_stats,'-report_prom_stats/2-fun-0-',2,
|
[{file,"src/ns_server_stats.erl"},{line,170}]},
|
{ns_server_stats,report_prom_stats,2,
|
[{file,"src/ns_server_stats.erl"},{line,178}]},
|
{async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
|
[ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
|
{path,"/pools/default"},
|
{method,'GET'},
|
{type,exit},
|
{what,
|
{noproc,
|
{gen_server,call,
|
['service_status_keeper-index',
|
get_version]}}},
|
{trace,
|
[{gen_server,call,2,
|
[{file,"gen_server.erl"},{line,239}]},
|
{menelaus_web_pools,do_build_pool_info,4,
|
[{file,"src/menelaus_web_pools.erl"},
|
{line,211}]},
|
{menelaus_web_pools,pool_info,6,
|
[{file,"src/menelaus_web_pools.erl"},
|
{line,106}]},
|
{menelaus_web_pools,handle_pool_info_wait,
|
5,
|
[{file,"src/menelaus_web_pools.erl"},
|
{line,118}]},
|
{request_tracker,request,2,
|
[{file,"src/request_tracker.erl"},
|
{line,40}]},
|
{menelaus_util,handle_request,2,
|
[{file,"src/menelaus_util.erl"},
|
{line,221}]},
|
{mochiweb_http,headers,6,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
|
{line,153}]},
|
{proc_lib,init_p_do_apply,3,
|
[{file,"proc_lib.erl"},{line,226}]}]}]
|
|
Maybe it is another side effect of MB-49512 which is also hit frequently during cleanups related to bucket not being dropped completely.
cbcollect_info attached.
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Summary | Certain nodes are not accessible post cleanup of the cluster | Some nodes in the cluster are rendered unusable post cleanup |
Description |
+Script to Repro+
{noformat} There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster in unusable state. {noformat} +172.23.100.35+ {noformat} [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,lookup, [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,257}]}, {lists,foreach,2,[{file,"lists.erl"},{line,1342}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,180}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc, {gen_server,call, [ns_server_stats,get_stats]}} [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]}, {ns_server_stats,report_system_stats,1, [{file,"src/ns_server_stats.erl"},{line,188}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,182}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,safe_fixtable, [ns_server_stats,true], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ets,foldl,3,[{file,"ets.erl"},{line,625}]}, {ns_server_stats,report_ns_server_hc_stats,1, [{file,"src/ns_server_stats.erl"},{line,264}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,178}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed", {path,"/pools/default"}, {method,'GET'}, {type,exit}, {what, {noproc, {gen_server,call, ['service_status_keeper-index', get_version]}}}, {trace, [{gen_server,call,2, [{file,"gen_server.erl"},{line,239}]}, {menelaus_web_pools,do_build_pool_info,4, [{file,"src/menelaus_web_pools.erl"}, {line,211}]}, {menelaus_web_pools,pool_info,6, [{file,"src/menelaus_web_pools.erl"}, {line,106}]}, {menelaus_web_pools,handle_pool_info_wait, 5, [{file,"src/menelaus_web_pools.erl"}, {line,118}]}, {request_tracker,request,2, [{file,"src/request_tracker.erl"}, {line,40}]}, {menelaus_util,handle_request,2, [{file,"src/menelaus_util.erl"}, {line,221}]}, {mochiweb_http,headers,6, [{file, "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"}, {line,153}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,226}]}]}] {noformat} Maybe it is another side effect of cbcollect_info attached. |
+Script to Repro+
{noformat} There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable. {noformat} +172.23.100.35+ {noformat} [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,lookup, [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,257}]}, {lists,foreach,2,[{file,"lists.erl"},{line,1342}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,180}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc, {gen_server,call, [ns_server_stats,get_stats]}} [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]}, {ns_server_stats,report_system_stats,1, [{file,"src/ns_server_stats.erl"},{line,188}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,182}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,safe_fixtable, [ns_server_stats,true], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ets,foldl,3,[{file,"ets.erl"},{line,625}]}, {ns_server_stats,report_ns_server_hc_stats,1, [{file,"src/ns_server_stats.erl"},{line,264}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,178}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed", {path,"/pools/default"}, {method,'GET'}, {type,exit}, {what, {noproc, {gen_server,call, ['service_status_keeper-index', get_version]}}}, {trace, [{gen_server,call,2, [{file,"gen_server.erl"},{line,239}]}, {menelaus_web_pools,do_build_pool_info,4, [{file,"src/menelaus_web_pools.erl"}, {line,211}]}, {menelaus_web_pools,pool_info,6, [{file,"src/menelaus_web_pools.erl"}, {line,106}]}, {menelaus_web_pools,handle_pool_info_wait, 5, [{file,"src/menelaus_web_pools.erl"}, {line,118}]}, {request_tracker,request,2, [{file,"src/request_tracker.erl"}, {line,40}]}, {menelaus_util,handle_request,2, [{file,"src/menelaus_util.erl"}, {line,221}]}, {mochiweb_http,headers,6, [{file, "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"}, {line,153}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,226}]}]}] {noformat} Maybe it is another side effect of cbcollect_info attached. |
Attachment | consoleText_MB-51319.txt [ 179076 ] |
Description |
+Script to Repro+
{noformat} There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable. {noformat} +172.23.100.35+ {noformat} [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,lookup, [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,257}]}, {lists,foreach,2,[{file,"lists.erl"},{line,1342}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,180}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc, {gen_server,call, [ns_server_stats,get_stats]}} [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]}, {ns_server_stats,report_system_stats,1, [{file,"src/ns_server_stats.erl"},{line,188}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,182}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,safe_fixtable, [ns_server_stats,true], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ets,foldl,3,[{file,"ets.erl"},{line,625}]}, {ns_server_stats,report_ns_server_hc_stats,1, [{file,"src/ns_server_stats.erl"},{line,264}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,178}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed", {path,"/pools/default"}, {method,'GET'}, {type,exit}, {what, {noproc, {gen_server,call, ['service_status_keeper-index', get_version]}}}, {trace, [{gen_server,call,2, [{file,"gen_server.erl"},{line,239}]}, {menelaus_web_pools,do_build_pool_info,4, [{file,"src/menelaus_web_pools.erl"}, {line,211}]}, {menelaus_web_pools,pool_info,6, [{file,"src/menelaus_web_pools.erl"}, {line,106}]}, {menelaus_web_pools,handle_pool_info_wait, 5, [{file,"src/menelaus_web_pools.erl"}, {line,118}]}, {request_tracker,request,2, [{file,"src/request_tracker.erl"}, {line,40}]}, {menelaus_util,handle_request,2, [{file,"src/menelaus_util.erl"}, {line,221}]}, {mochiweb_http,headers,6, [{file, "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"}, {line,153}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,226}]}]}] {noformat} Maybe it is another side effect of cbcollect_info attached. |
+Script to Repro+
{noformat} There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable. {noformat} Sequence of steps that led to the node being unusable. {noformat} 2022-03-06 13:49:41,844 | test | INFO | MainThread | [basetestcase:log_setup_status:647] ========= BaseTestCase setup started for test #5 test_data_load_collections_with_graceful_failover_rebalance_out ========= 2022-03-06 13:50:22,311 | test | INFO | MainThread | [rest_client:monitorRebalance:1610] Rebalance done. Taken 11.0540001392 seconds to complete 2022-03-06 13:50:22,319 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 5 seconds. Reason: Wait after rebalance complete 2022-03-06 13:50:27,359 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:27,362 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:32,378 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:32,380 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:37,394 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:37,395 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:42,404 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state! 2022-03-06 13:50:42,404 | test | ERROR | MainThread | [cluster_ready_functions:cleanup_cluster:232] Can't create rest connection after rebalance out for ejected nodes, will retry after 10 seconds according to 2022-03-06 13:50:42,411 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 10 seconds. Reason: 2022-03-06 13:50:52,420 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:52,421 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:57,436 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:57,437 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:51:02,453 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:51:02,456 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:51:07,463 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state! Traceback (most recent call last): File "pytests/basetestcase.py", line 363, in setUp self.cluster_util.cluster_cleanup(cluster, File "pytests/basetestcase.py", line 363, in setUp self.cluster_util.cluster_cleanup(cluster, File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 169, in cluster_cleanup self.cleanup_cluster(cluster, master=cluster.master) File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 237, in cleanup_cluster rest = RestConnection(removed) File "lib/membase/api/rest_client.py", line 319, in __init__ raise ServerUnavailableException(self.ip) ServerUnavailableException: Unable to reach the host @ 172.23.100.35 {noformat} +172.23.100.35+ {noformat} [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,lookup, [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,257}]}, {lists,foreach,2,[{file,"lists.erl"},{line,1342}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,180}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc, {gen_server,call, [ns_server_stats,get_stats]}} [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]}, {ns_server_stats,report_system_stats,1, [{file,"src/ns_server_stats.erl"},{line,188}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,182}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,safe_fixtable, [ns_server_stats,true], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ets,foldl,3,[{file,"ets.erl"},{line,625}]}, {ns_server_stats,report_ns_server_hc_stats,1, [{file,"src/ns_server_stats.erl"},{line,264}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,178}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed", {path,"/pools/default"}, {method,'GET'}, {type,exit}, {what, {noproc, {gen_server,call, ['service_status_keeper-index', get_version]}}}, {trace, [{gen_server,call,2, [{file,"gen_server.erl"},{line,239}]}, {menelaus_web_pools,do_build_pool_info,4, [{file,"src/menelaus_web_pools.erl"}, {line,211}]}, {menelaus_web_pools,pool_info,6, [{file,"src/menelaus_web_pools.erl"}, {line,106}]}, {menelaus_web_pools,handle_pool_info_wait, 5, [{file,"src/menelaus_web_pools.erl"}, {line,118}]}, {request_tracker,request,2, [{file,"src/request_tracker.erl"}, {line,40}]}, {menelaus_util,handle_request,2, [{file,"src/menelaus_util.erl"}, {line,221}]}, {mochiweb_http,headers,6, [{file, "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"}, {line,153}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,226}]}]}] {noformat} Maybe it is another side effect of cbcollect_info attached. |
Description |
+Script to Repro+
{noformat} There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable. {noformat} Sequence of steps that led to the node being unusable. {noformat} 2022-03-06 13:49:41,844 | test | INFO | MainThread | [basetestcase:log_setup_status:647] ========= BaseTestCase setup started for test #5 test_data_load_collections_with_graceful_failover_rebalance_out ========= 2022-03-06 13:50:22,311 | test | INFO | MainThread | [rest_client:monitorRebalance:1610] Rebalance done. Taken 11.0540001392 seconds to complete 2022-03-06 13:50:22,319 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 5 seconds. Reason: Wait after rebalance complete 2022-03-06 13:50:27,359 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:27,362 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:32,378 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:32,380 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:37,394 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:37,395 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:42,404 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state! 2022-03-06 13:50:42,404 | test | ERROR | MainThread | [cluster_ready_functions:cleanup_cluster:232] Can't create rest connection after rebalance out for ejected nodes, will retry after 10 seconds according to 2022-03-06 13:50:42,411 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 10 seconds. Reason: 2022-03-06 13:50:52,420 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:52,421 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:57,436 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:57,437 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:51:02,453 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:51:02,456 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:51:07,463 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state! Traceback (most recent call last): File "pytests/basetestcase.py", line 363, in setUp self.cluster_util.cluster_cleanup(cluster, File "pytests/basetestcase.py", line 363, in setUp self.cluster_util.cluster_cleanup(cluster, File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 169, in cluster_cleanup self.cleanup_cluster(cluster, master=cluster.master) File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 237, in cleanup_cluster rest = RestConnection(removed) File "lib/membase/api/rest_client.py", line 319, in __init__ raise ServerUnavailableException(self.ip) ServerUnavailableException: Unable to reach the host @ 172.23.100.35 {noformat} +172.23.100.35+ {noformat} [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,lookup, [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,257}]}, {lists,foreach,2,[{file,"lists.erl"},{line,1342}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,180}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc, {gen_server,call, [ns_server_stats,get_stats]}} [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]}, {ns_server_stats,report_system_stats,1, [{file,"src/ns_server_stats.erl"},{line,188}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,182}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,safe_fixtable, [ns_server_stats,true], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ets,foldl,3,[{file,"ets.erl"},{line,625}]}, {ns_server_stats,report_ns_server_hc_stats,1, [{file,"src/ns_server_stats.erl"},{line,264}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,178}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed", {path,"/pools/default"}, {method,'GET'}, {type,exit}, {what, {noproc, {gen_server,call, ['service_status_keeper-index', get_version]}}}, {trace, [{gen_server,call,2, [{file,"gen_server.erl"},{line,239}]}, {menelaus_web_pools,do_build_pool_info,4, [{file,"src/menelaus_web_pools.erl"}, {line,211}]}, {menelaus_web_pools,pool_info,6, [{file,"src/menelaus_web_pools.erl"}, {line,106}]}, {menelaus_web_pools,handle_pool_info_wait, 5, [{file,"src/menelaus_web_pools.erl"}, {line,118}]}, {request_tracker,request,2, [{file,"src/request_tracker.erl"}, {line,40}]}, {menelaus_util,handle_request,2, [{file,"src/menelaus_util.erl"}, {line,221}]}, {mochiweb_http,headers,6, [{file, "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"}, {line,153}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,226}]}]}] {noformat} Maybe it is another side effect of cbcollect_info attached. |
+Script to Repro+
{noformat} There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable. {noformat} Logs before node became unusable. {noformat} 2022-03-06 13:49:41,844 | test | INFO | MainThread | [basetestcase:log_setup_status:647] ========= BaseTestCase setup started for test #5 test_data_load_collections_with_graceful_failover_rebalance_out ========= 2022-03-06 13:50:22,311 | test | INFO | MainThread | [rest_client:monitorRebalance:1610] Rebalance done. Taken 11.0540001392 seconds to complete 2022-03-06 13:50:22,319 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 5 seconds. Reason: Wait after rebalance complete 2022-03-06 13:50:27,359 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:27,362 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:32,378 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:32,380 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:37,394 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:37,395 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:42,404 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state! 2022-03-06 13:50:42,404 | test | ERROR | MainThread | [cluster_ready_functions:cleanup_cluster:232] Can't create rest connection after rebalance out for ejected nodes, will retry after 10 seconds according to 2022-03-06 13:50:42,411 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 10 seconds. Reason: 2022-03-06 13:50:52,420 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:52,421 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:50:57,436 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:50:57,437 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:51:02,453 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster. 2022-03-06 13:51:02,456 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry 2022-03-06 13:51:07,463 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state! Traceback (most recent call last): File "pytests/basetestcase.py", line 363, in setUp self.cluster_util.cluster_cleanup(cluster, File "pytests/basetestcase.py", line 363, in setUp self.cluster_util.cluster_cleanup(cluster, File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 169, in cluster_cleanup self.cleanup_cluster(cluster, master=cluster.master) File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 237, in cleanup_cluster rest = RestConnection(removed) File "lib/membase/api/rest_client.py", line 319, in __init__ raise ServerUnavailableException(self.ip) ServerUnavailableException: Unable to reach the host @ 172.23.100.35 {noformat} +172.23.100.35+ {noformat} [ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,lookup, [ns_server_stats,{c,{<<"rest_request_enters">>,[]}}], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,257}]}, {lists,foreach,2,[{file,"lists.erl"},{line,1342}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,180}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc, {gen_server,call, [ns_server_stats,get_stats]}} [{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]}, {ns_server_stats,report_system_stats,1, [{file,"src/ns_server_stats.erl"},{line,188}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,182}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket [ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg [{ets,safe_fixtable, [ns_server_stats,true], [{error_info,#{cause => id,module => erl_stdlib_errors}}]}, {ets,foldl,3,[{file,"ets.erl"},{line,625}]}, {ns_server_stats,report_ns_server_hc_stats,1, [{file,"src/ns_server_stats.erl"},{line,264}]}, {ns_server_stats,'-report_prom_stats/2-fun-0-',2, [{file,"src/ns_server_stats.erl"},{line,170}]}, {ns_server_stats,report_prom_stats,2, [{file,"src/ns_server_stats.erl"},{line,178}]}, {async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}] [ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed", {path,"/pools/default"}, {method,'GET'}, {type,exit}, {what, {noproc, {gen_server,call, ['service_status_keeper-index', get_version]}}}, {trace, [{gen_server,call,2, [{file,"gen_server.erl"},{line,239}]}, {menelaus_web_pools,do_build_pool_info,4, [{file,"src/menelaus_web_pools.erl"}, {line,211}]}, {menelaus_web_pools,pool_info,6, [{file,"src/menelaus_web_pools.erl"}, {line,106}]}, {menelaus_web_pools,handle_pool_info_wait, 5, [{file,"src/menelaus_web_pools.erl"}, {line,118}]}, {request_tracker,request,2, [{file,"src/request_tracker.erl"}, {line,40}]}, {menelaus_util,handle_request,2, [{file,"src/menelaus_util.erl"}, {line,221}]}, {mochiweb_http,headers,6, [{file, "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"}, {line,153}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,226}]}]}] {noformat} Maybe it is another side effect of cbcollect_info attached. |
Labels | affects-neo-testing functional-test | affects-neo-testing functional-test magma |
Assignee | Balakumaran Gopal [ balakumaran.gopal ] | Daniel Owen [ owend ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Component/s | couchbase-bucket [ 10173 ] | |
Component/s | ns_server [ 10019 ] |
Status | In Progress [ 3 ] | Open [ 1 ] |
Assignee | Daniel Owen [ owend ] | Dave Finlay [ dfinlay ] |
Assignee | Dave Finlay [ dfinlay ] | Daniel Owen [ owend ] |
Component/s | couchbase-bucket [ 10173 ] |
Attachment | latest-diag.log [ 179095 ] | |
Attachment | debug.log.1 [ 179094 ] | |
Attachment | memcached.log.000021.txt [ 179093 ] |
Component/s | ns_server [ 10019 ] | |
Component/s | storage-engine [ 10175 ] |
Assignee | Daniel Owen [ owend ] | Sarath Lakshman [ sarath ] |
Assignee | Sarath Lakshman [ sarath ] | Paolo Cocchi [ paolo.cocchi ] |
Attachment | node35.gdb.log [ 179104 ] |
Assignee | Paolo Cocchi [ paolo.cocchi ] | Balakumaran Gopal [ balakumaran.gopal ] |
Assignee | Balakumaran Gopal [ balakumaran.gopal ] | Srinath Duvuru [ srinath.duvuru ] |
Assignee | Srinath Duvuru [ srinath.duvuru ] | Sarath Lakshman [ sarath ] |
Sprint | KV Post-Neo [ 2016 ] |
Sprint | KV Post-Neo [ 2016 ] | KV 2022-Feb [ 2002 ] |
Rank | Ranked higher |
Sprint | KV 2022-Feb [ 2002 ] | KV 2022-Feb, KV Post-Neo 2 [ 2002, 2050 ] |
Component/s | couchbase-bucket [ 10173 ] |
Due Date | 10/Mar/22 |
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
Assignee | Sarath Lakshman [ sarath ] | Balakumaran Gopal [ balakumaran.gopal ] |
Status | Resolved [ 5 ] | Closed [ 6 ] |
Hi Balakumaran Gopal,
On .35 from the memcached.log we see the last delete of the "Default bucket"
2022-03-06T11:54:09.830406-08:00 INFO <none>: Delete bucket [default]. Notifying engine
2022-03-06T11:54:09.837324-08:00 INFO <none>: Delete bucket [default]. Engine ready for shutdown
2022-03-06T11:54:09.837327-08:00 INFO <none>: Delete bucket [default]. Wait for 6 clients to disconnect
2022-03-06T11:54:09.838533-08:00 INFO <none>: Delete bucket [default]. Shut down the bucket
2022-03-06T11:54:14.502380-08:00 INFO (default) Deleted KvBucket.
2022-03-06T11:54:16.056152-08:00 INFO <none>: Delete bucket [default]. Clean up allocated resources
2022-03-06T11:54:16.056200-08:00 INFO <none>: Delete bucket [default] complete
Looking at the 2022-03-07/172.23.100.35.zip logs the last create bucket for default is
2022-03-06T11:55:30.125015-08:00 INFO 96: Create Couchbase bucket [default]
It appears Warmup completed successfully
2022-03-06T11:55:30.147945-08:00 INFO (default) Warmup::processCreateVBucketsComplete unblocking 1 cookie(s)
2022-03-06T11:55:30.148367-08:00 INFO (default) Warmup completed: 0 keys and 0 values loaded in 4231 us (0 keys/s), mem_used now at 4.588072 MB (1084.349639130889 MB/s)
However can see from the ns_server.debug.log that ns_server appears stuck
bash-5.1$ grep "Bucket \"default\" not found during ensure_bucket" ns_server.debug.log | grep "2022-03-06T12" | wc -l
0
bash-5.1$ grep "Bucket \"default\" not found during ensure_bucket" ns_server.debug.log | grep "2022-03-06T13" | wc -l
3710
bash-5.1$ grep "Bucket \"default\" not found during ensure_bucket" ns_server.debug.log | grep "2022-03-06T14" | wc -l
15766
bash-5.1$ grep "Bucket \"default\" not found during ensure_bucket" ns_server.debug.log | grep "2022-03-06T15" | wc -l
14758
bash-5.1$ grep "Bucket \"default\" not found during ensure_bucket" ns_server.debug.log | grep "2022-03-06T16" | wc -l
14758
bash-5.1$ grep "Bucket \"default\" not found during ensure_bucket" ns_server.debug.log | grep "2022-03-06T17" | wc -l
14632
bash-5.1$ grep "Bucket \"default\" not found during ensure_bucket" ns_server.debug.log | grep "2022-03-06T18" | wc -l
13969
bash-5.1$ grep "Bucket \"default\" not found during ensure_bucket" ns_server.debug.log | grep "2022-03-06T19" | wc -l
13680
bash-5.1$ grep "Bucket \"default\" not found during ensure_bucket" ns_server.debug.log | grep "2022-03-06T20" | wc -l
13425
bash-5.1$ grep "Bucket \"default\" not found during ensure_bucket" ns_server.debug.log | grep "2022-03-06T21" | wc -l
5716
From KV it appears we have enabled traffic
2022-03-06T11:55:31.106216-08:00 INFO (default) EventuallyPersistentEngine::enableTraffic() result true
2022-03-06T11:55:31.106224-08:00 INFO (default) EventuallyPersistentEngine::handleTrafficControlCmd() Data traffic to persistence engine is enabled