Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
7.1.0-2440
-
Untriaged
-
Centos 64-bit
-
1
-
Unknown
-
KV 2022-Feb, KV March-22
Description
Script to Repro
There is not a particular test that repro's this. Basically cleanup can fail for any test rendering one or more nodes in the cluster unusable.
|
Logs before node became unusable.
2022-03-06 13:49:41,844 | test | INFO | MainThread | [basetestcase:log_setup_status:647] ========= BaseTestCase setup started for test #5 test_data_load_collections_with_graceful_failover_rebalance_out =========
|
2022-03-06 13:50:22,311 | test | INFO | MainThread | [rest_client:monitorRebalance:1610] Rebalance done. Taken 11.0540001392 seconds to complete
|
2022-03-06 13:50:22,319 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 5 seconds. Reason: Wait after rebalance complete
|
2022-03-06 13:50:27,359 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:50:27,362 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:50:32,378 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:50:32,380 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:50:37,394 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:50:37,395 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:50:42,404 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
|
2022-03-06 13:50:42,404 | test | ERROR | MainThread | [cluster_ready_functions:cleanup_cluster:232] Can't create rest connection after rebalance out for ejected nodes, will retry after 10 seconds according to MB-8430: Unable to reach the host @ 172.23.100.35
|
2022-03-06 13:50:42,411 | test | INFO | MainThread | [common_lib:sleep:23] Sleep 10 seconds. Reason: MB-8430
|
2022-03-06 13:50:52,420 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:50:52,421 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:50:57,436 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:50:57,437 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:51:02,453 | test | ERROR | MainThread | [rest_client:_http_request:834] GET http://172.23.100.35:8091/nodes/self body: headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Content-Type': 'application/json'} error: 404 reason: unknown "Node is unknown to this cluster." auth: Administrator:password
|
http://172.23.100.35:8091/nodes/self with status 0: Node is unknown to this cluster.
|
2022-03-06 13:51:02,456 | test | ERROR | MainThread | [rest_client:__init__:312] Error Node is unknown to this cluster. was gotten, 5 seconds sleep before retry
|
2022-03-06 13:51:07,463 | test | ERROR | MainThread | [rest_client:__init__:317] Node 172.23.100.35:8091 is in a broken state!
|
Traceback (most recent call last):
|
File "pytests/basetestcase.py", line 363, in setUp
|
self.cluster_util.cluster_cleanup(cluster,
|
File "pytests/basetestcase.py", line 363, in setUp
|
self.cluster_util.cluster_cleanup(cluster,
|
File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 169, in cluster_cleanup
|
self.cleanup_cluster(cluster, master=cluster.master)
|
File "couchbase_utils/cluster_utils/cluster_ready_functions.py", line 237, in cleanup_cluster
|
rest = RestConnection(removed)
|
File "lib/membase/api/rest_client.py", line 319, in __init__
|
raise ServerUnavailableException(self.ip)
|
ServerUnavailableException: Unable to reach the host @ 172.23.100.35
|
172.23.100.35
[ns_server:debug,2022-03-06T21:12:03.979-08:00,ns_1@172.23.100.35:<0.17157.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
|
[ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
|
[{ets,lookup,
|
[ns_server_stats,{c,{<<"rest_request_enters">>,[]}}],
|
[{error_info,#{cause => id,module => erl_stdlib_errors}}]},
|
{ns_server_stats,'-report_ns_server_lc_stats/1-fun-0-',2,
|
[{file,"src/ns_server_stats.erl"},{line,257}]},
|
{lists,foreach,2,[{file,"lists.erl"},{line,1342}]},
|
{ns_server_stats,'-report_prom_stats/2-fun-0-',2,
|
[{file,"src/ns_server_stats.erl"},{line,170}]},
|
{ns_server_stats,report_prom_stats,2,
|
[{file,"src/ns_server_stats.erl"},{line,180}]},
|
{async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
|
[ns_server:error,2022-03-06T21:12:04.012-08:00,ns_1@172.23.100.35:<0.17160.46>:ns_server_stats:report_prom_stats:172]system stats reporting exception: exit:{noproc,
|
{gen_server,call,
|
[ns_server_stats,get_stats]}}
|
[{gen_server,call,2,[{file,"gen_server.erl"},{line,239}]},
|
{ns_server_stats,report_system_stats,1,
|
[{file,"src/ns_server_stats.erl"},{line,188}]},
|
{ns_server_stats,'-report_prom_stats/2-fun-0-',2,
|
[{file,"src/ns_server_stats.erl"},{line,170}]},
|
{ns_server_stats,report_prom_stats,2,
|
[{file,"src/ns_server_stats.erl"},{line,182}]},
|
{async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
|
[ns_server:debug,2022-03-06T21:12:04.110-08:00,ns_1@172.23.100.35:<0.17158.46>:ns_memcached:ensure_bucket_inner:1318]Bucket "default" not found during ensure_bucket
|
[ns_server:error,2022-03-06T21:12:04.324-08:00,ns_1@172.23.100.35:<0.17166.46>:ns_server_stats:report_prom_stats:172]ns_server stats reporting exception: error:badarg
|
[{ets,safe_fixtable,
|
[ns_server_stats,true],
|
[{error_info,#{cause => id,module => erl_stdlib_errors}}]},
|
{ets,foldl,3,[{file,"ets.erl"},{line,625}]},
|
{ns_server_stats,report_ns_server_hc_stats,1,
|
[{file,"src/ns_server_stats.erl"},{line,264}]},
|
{ns_server_stats,'-report_prom_stats/2-fun-0-',2,
|
[{file,"src/ns_server_stats.erl"},{line,170}]},
|
{ns_server_stats,report_prom_stats,2,
|
[{file,"src/ns_server_stats.erl"},{line,178}]},
|
{async,'-async_init/4-fun-1-',3,[{file,"src/async.erl"},{line,191}]}]
|
[ns_server:error,2022-03-06T21:12:04.651-08:00,ns_1@172.23.100.35:<0.17058.46>:menelaus_util:reply_server_error_before_close:210]Server error during processing: ["web request failed",
|
{path,"/pools/default"},
|
{method,'GET'},
|
{type,exit},
|
{what,
|
{noproc,
|
{gen_server,call,
|
['service_status_keeper-index',
|
get_version]}}},
|
{trace,
|
[{gen_server,call,2,
|
[{file,"gen_server.erl"},{line,239}]},
|
{menelaus_web_pools,do_build_pool_info,4,
|
[{file,"src/menelaus_web_pools.erl"},
|
{line,211}]},
|
{menelaus_web_pools,pool_info,6,
|
[{file,"src/menelaus_web_pools.erl"},
|
{line,106}]},
|
{menelaus_web_pools,handle_pool_info_wait,
|
5,
|
[{file,"src/menelaus_web_pools.erl"},
|
{line,118}]},
|
{request_tracker,request,2,
|
[{file,"src/request_tracker.erl"},
|
{line,40}]},
|
{menelaus_util,handle_request,2,
|
[{file,"src/menelaus_util.erl"},
|
{line,221}]},
|
{mochiweb_http,headers,6,
|
[{file,
|
"/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/mochiweb/mochiweb_http.erl"},
|
{line,153}]},
|
{proc_lib,init_p_do_apply,3,
|
[{file,"proc_lib.erl"},{line,226}]}]}]
|
|
Maybe it is another side effect of MB-49512 which is also hit frequently during cleanups related to bucket not being dropped completely.
cbcollect_info attached.