Details
Description
After a number of rebalance in and outs of on a cluster_run cluster of n_1 & n_2, n_2 attempts to contact webserver on n_2 (:9002) get connection refused, this condition persists at least until the our test framework gives up (~90s after last n_2 rebalance out).
2021-04-27T07:14:30.469-07:00 INFO ClusterExecutionITBase [main] Running cli: rebalance -c 172.18.0.3:9001 -u couchbase -p couchbase --server-remove 172.18.0.3:9002
|
2021-04-27T07:16:05.480-07:00 INFO ClusterExecutionITBase [main+] >> Unable to display progress bar on this os
|
2021-04-27T07:16:05.480-07:00 INFO ClusterExecutionITBase [main+] >> SUCCESS: Rebalance complete
|
...
|
2021-04-27T07:17:35.601-07:00 ERRO TestExecutor [main] testFile src/test/resources/runtimets/queries/remote/cb/lifecycle/alternate-address-rebalance-out/test.15.cb.cmd raised an unexpected exception
|
java.util.concurrent.ExecutionException: java.lang.IllegalStateException: timed out before desired response received (last result: org.apache.http.conn.HttpHostConnectException: Connect to 172.18.0.3:9002 [/172.18.0.3] failed: Connection refused (Connection refused))
|
This does not seem to be intermittent, it fails reliably on every test run, both locally on macbook & on jenkins ubuntu environment.
Note, the cbcollect_infos failed with some dump-guts failure, so while i've attached the cbcollect_infos they seem to be useless- i have also attached the raw logs for these nodes.
2021-04-27T07:18:06.716-07:00 INFO ClusterExecutionITBase [main+] >> Found dump-guts: /home/couchbase/jenkins/workspace/cbas-cbcluster-test2/install/bin/dump-guts
|
2021-04-27T07:18:06.721-07:00 INFO ClusterExecutionITBase [ForkJoinPool.commonPool-worker-5+] >> Raw PID 1 control groups /proc/1/cgroup (cat /proc/1/cgroup) - OK
|
2021-04-27T07:18:06.721-07:00 INFO ClusterExecutionITBase [ForkJoinPool.commonPool-worker-5+] >> Found dump-guts: /home/couchbase/jenkins/workspace/cbas-cbcluster-test2/install/bin/dump-guts
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> Error occurred getting server guts: Got exception: {error,badarg}
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> [{lists,keyfind,[port_meta,1,'_deleted'],[]},
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> {'dump-guts__escript__1619__533086__994357__5',extract_rest_port,2,
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> [{file,
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> "/home/couchbase/jenkins/workspace/cbas-cbcluster-test2/install/bin/dump-guts"},
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> {line,458}]},
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> {'dump-guts__escript__1619__533086__994357__5',main_with_everything,4,
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> [{file,
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> "/home/couchbase/jenkins/workspace/cbas-cbcluster-test2/install/bin/dump-guts"},
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> {line,553}]},
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> {'dump-guts__escript__1619__533086__994357__5',main,1,
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> [{file,
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> "/home/couchbase/jenkins/workspace/cbas-cbcluster-test2/install/bin/dump-guts"},
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> {line,136}]},
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> {escript,run,2,[{file,"escript.erl"},{line,758}]},
|
2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> {escript,start,1,[{file,"escript.erl"},{line,277}]},
|
2021-04-27T07:18:07.653-07:00 INFO ClusterExecutionITBase [main+] >> {init,start_em,1,[]},
|
2021-04-27T07:18:07.653-07:00 INFO ClusterExecutionITBase [main+] >> {init,do_boot,3,[]}]
|
|
The n_2 node seems to be in some state where it keeps logging this repeatedly:
{net_kernel,{auto_connect,'couchdb_n_2@cb.local',
|
{1132,#Ref<0.2357520616.3444178948.168890>}}}
|
[ns_server:debug,2021-04-27T07:19:59.374-07:00,n_2@172.18.0.3:net_kernel<0.1669.0>:cb_dist:info_msg:778]cb_dist: Setting up new connection to 'couchdb_n_2@cb.local' using inet_tcp_dist
|
[ns_server:debug,2021-04-27T07:19:59.374-07:00,n_2@172.18.0.3:cb_dist<0.1666.0>:cb_dist:info_msg:778]cb_dist: Added connection {con,#Ref<0.2357520616.3444310017.167261>,
|
inet_tcp_dist,undefined,undefined}
|
[ns_server:debug,2021-04-27T07:19:59.374-07:00,n_2@172.18.0.3:cb_dist<0.1666.0>:cb_dist:info_msg:778]cb_dist: Updated connection: {con,#Ref<0.2357520616.3444310017.167261>,
|
inet_tcp_dist,<0.14481.4>,
|
#Ref<0.2357520616.3444310017.167264>}
|
[error_logger:info,2021-04-27T07:19:59.386-07:00,n_2@172.18.0.3:net_kernel<0.1669.0>:ale_error_logger_handler:do_log:101]
|
=========================NOTICE REPORT=========================
|
{net_kernel,{'EXIT',<0.14481.4>,{recv_challenge_ack_failed,{error,closed}}}}
|
[ns_server:debug,2021-04-27T07:19:59.386-07:00,n_2@172.18.0.3:cb_dist<0.1666.0>:cb_dist:info_msg:778]cb_dist: Connection down: {con,#Ref<0.2357520616.3444310017.167261>,
|
inet_tcp_dist,<0.14481.4>,
|
#Ref<0.2357520616.3444310017.167264>}
|
[error_logger:info,2021-04-27T07:19:59.386-07:00,n_2@172.18.0.3:net_kernel<0.1669.0>:ale_error_logger_handler:do_log:101]
|
=========================NOTICE REPORT=========================
|
{net_kernel,{net_kernel,1054,nodedown,'couchdb_n_2@cb.local'}}
|
[ns_server:debug,2021-04-27T07:19:59.387-07:00,n_2@172.18.0.3:<0.14365.4>:ns_server_nodes_sup:do_wait_link_to_couchdb_node:161]ns_couchdb is not ready: {badrpc,nodedown}
|
[error_logger:info,2021-04-27T07:19:59.588-07:00,n_2@172.18.0.3:net_kernel<0.1669.0>:ale_error_logger_handler:do_log:101]
|
Attachments
Issue Links
- relates to
-
MB-45263 [CX] ClusterCbRemoteLinksLifecycleIT 1: lifecycle: alternate-address-rebalance-out
- Resolved