Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: ns_server
Labels:
- analytics
Environment:
test[ClusterCbRemoteLinksLifecycleIT 2: lifecycle: alternate-address-rebalance-out]

Story Points:
1
Is this a Regression?:
Yes

Description

After a number of rebalance in and outs of on a cluster_run cluster of n_1 & n_2, n_2 attempts to contact webserver on n_2 (:9002) get connection refused, this condition persists at least until the our test framework gives up (~90s after last n_2 rebalance out).

2021-04-27T07:14:30.469-07:00 INFO ClusterExecutionITBase [main] Running cli: rebalance -c 172.18.0.3:9001 -u couchbase -p couchbase --server-remove 172.18.0.3:9002

2021-04-27T07:16:05.480-07:00 INFO ClusterExecutionITBase [main+] >> Unable to display progress bar on this os

2021-04-27T07:16:05.480-07:00 INFO ClusterExecutionITBase [main+] >> SUCCESS: Rebalance complete

...

2021-04-27T07:17:35.601-07:00 ERRO TestExecutor [main] testFile src/test/resources/runtimets/queries/remote/cb/lifecycle/alternate-address-rebalance-out/test.15.cb.cmd raised an unexpected exception

java.util.concurrent.ExecutionException: java.lang.IllegalStateException: timed out before desired response received (last result: org.apache.http.conn.HttpHostConnectException: Connect to 172.18.0.3:9002 [/172.18.0.3] failed: Connection refused (Connection refused))

This does not seem to be intermittent, it fails reliably on every test run, both locally on macbook & on jenkins ubuntu environment.

Note, the cbcollect_infos failed with some dump-guts failure, so while i've attached the cbcollect_infos they seem to be useless- i have also attached the raw logs for these nodes.

2021-04-27T07:18:06.716-07:00 INFO ClusterExecutionITBase [main+] >> Found dump-guts: /home/couchbase/jenkins/workspace/cbas-cbcluster-test2/install/bin/dump-guts

2021-04-27T07:18:06.721-07:00 INFO ClusterExecutionITBase [ForkJoinPool.commonPool-worker-5+] >> Raw PID 1 control groups /proc/1/cgroup (cat /proc/1/cgroup) - OK

2021-04-27T07:18:06.721-07:00 INFO ClusterExecutionITBase [ForkJoinPool.commonPool-worker-5+] >> Found dump-guts: /home/couchbase/jenkins/workspace/cbas-cbcluster-test2/install/bin/dump-guts

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> Error occurred getting server guts: Got exception: {error,badarg}

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >> [{lists,keyfind,[port_meta,1,'_deleted'],[]},

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>  {'dump-guts__escript__1619__533086__994357__5',extract_rest_port,2,

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>      [{file,

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>           "/home/couchbase/jenkins/workspace/cbas-cbcluster-test2/install/bin/dump-guts"},

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>       {line,458}]},

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>  {'dump-guts__escript__1619__533086__994357__5',main_with_everything,4,

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>      [{file,

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>           "/home/couchbase/jenkins/workspace/cbas-cbcluster-test2/install/bin/dump-guts"},

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>       {line,553}]},

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>  {'dump-guts__escript__1619__533086__994357__5',main,1,

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>      [{file,

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>           "/home/couchbase/jenkins/workspace/cbas-cbcluster-test2/install/bin/dump-guts"},

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>       {line,136}]},

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>  {escript,run,2,[{file,"escript.erl"},{line,758}]},

2021-04-27T07:18:07.652-07:00 INFO ClusterExecutionITBase [main+] >>  {escript,start,1,[{file,"escript.erl"},{line,277}]},

2021-04-27T07:18:07.653-07:00 INFO ClusterExecutionITBase [main+] >>  {init,start_em,1,[]},

2021-04-27T07:18:07.653-07:00 INFO ClusterExecutionITBase [main+] >>  {init,do_boot,3,[]}]

The n_2 node seems to be in some state where it keeps logging this repeatedly:

{net_kernel,{auto_connect,'couchdb_n_2@cb.local',

                          {1132,#Ref<0.2357520616.3444178948.168890>}}}

[ns_server:debug,2021-04-27T07:19:59.374-07:00,n_2@172.18.0.3:net_kernel<0.1669.0>:cb_dist:info_msg:778]cb_dist: Setting up new connection to 'couchdb_n_2@cb.local' using inet_tcp_dist

[ns_server:debug,2021-04-27T07:19:59.374-07:00,n_2@172.18.0.3:cb_dist<0.1666.0>:cb_dist:info_msg:778]cb_dist: Added connection {con,#Ref<0.2357520616.3444310017.167261>,

                               inet_tcp_dist,undefined,undefined}

[ns_server:debug,2021-04-27T07:19:59.374-07:00,n_2@172.18.0.3:cb_dist<0.1666.0>:cb_dist:info_msg:778]cb_dist: Updated connection: {con,#Ref<0.2357520616.3444310017.167261>,

                                  inet_tcp_dist,<0.14481.4>,

                                  #Ref<0.2357520616.3444310017.167264>}

[error_logger:info,2021-04-27T07:19:59.386-07:00,n_2@172.18.0.3:net_kernel<0.1669.0>:ale_error_logger_handler:do_log:101]

=========================NOTICE REPORT=========================

{net_kernel,{'EXIT',<0.14481.4>,{recv_challenge_ack_failed,{error,closed}}}}

[ns_server:debug,2021-04-27T07:19:59.386-07:00,n_2@172.18.0.3:cb_dist<0.1666.0>:cb_dist:info_msg:778]cb_dist: Connection down: {con,#Ref<0.2357520616.3444310017.167261>,

                               inet_tcp_dist,<0.14481.4>,

                               #Ref<0.2357520616.3444310017.167264>}

[error_logger:info,2021-04-27T07:19:59.386-07:00,n_2@172.18.0.3:net_kernel<0.1669.0>:ale_error_logger_handler:do_log:101]

=========================NOTICE REPORT=========================

{net_kernel,{net_kernel,1054,nodedown,'couchdb_n_2@cb.local'}}

[ns_server:debug,2021-04-27T07:19:59.387-07:00,n_2@172.18.0.3:<0.14365.4>:ns_server_nodes_sup:do_wait_link_to_couchdb_node:161]ns_couchdb is not ready: {badrpc,nodedown}

[error_logger:info,2021-04-27T07:19:59.588-07:00,n_2@172.18.0.3:net_kernel<0.1669.0>:ale_error_logger_handler:do_log:101]

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

test-1.log
6 kB
27/Apr/21 11:31 AM
n_2_logs.zip
8.24 MB
27/Apr/21 11:43 AM
n_1_logs.zip
7.94 MB
27/Apr/21 11:43 AM
cbcollect_info_n_2.zip
14.30 MB
27/Apr/21 11:23 AM
cbcollect_info_n_1.zip
14.30 MB
27/Apr/21 11:23 AM

Issue Links

relates to

MB-45263 [CX] ClusterCbRemoteLinksLifecycleIT 1: lifecycle: alternate-address-rebalance-out

Resolved

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Dave Finlay

Reporter:: Michael Blow

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 26/Apr/21 12:52 PM

Updated:: 17/Jun/21 2:49 PM

Resolved:: 27/Apr/21 11:55 AM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

Workaround MB-45929 by not readding rebalanced-out node: Gerrit Review:

couchbase web server yields connection refused after several rebalance in & outs

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty