Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
6.5.0
-
Untriaged
-
Yes
Description
Build 6.5.0-4837
Observed that while enabling node to node encryption following error is thrown-
Turned on encryption for node: http://172.23.96.100:8091
|
ERROR: _ - Reconnect to 'ns_1@172.23.96.106' retries exceeded
|
Note that out of 10 runs we saw this error 8 times, with reconnect to different nodes failing.
Command used to enable node to node encryption-
/opt/couchbase/bin/couchbase-cli node-to-node-encryption --enable --cluster 172.23.96.100:8091 -u Administrator -p password
|
Here are details of one of run where we see this error-
Job- http://perf.jenkins.couchbase.com/job/titan-reb-dev/90/
Logs-
https://cb-jira.s3.us-east-2.amazonaws.com/logs/reconnect_error/collectinfo-2019-11-20T060144-ns_1%40172.23.96.100.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/reconnect_error/collectinfo-2019-11-20T060144-ns_1%40172.23.96.101.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/reconnect_error/collectinfo-2019-11-20T060144-ns_1%40172.23.96.102.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/reconnect_error/collectinfo-2019-11-20T060144-ns_1%40172.23.96.103.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/reconnect_error/collectinfo-2019-11-20T060144-ns_1%40172.23.96.104.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/reconnect_error/collectinfo-2019-11-20T060144-ns_1%40172.23.96.105.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/reconnect_error/collectinfo-2019-11-20T060144-ns_1%40172.23.96.106.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/reconnect_error/collectinfo-2019-11-20T060144-ns_1%40172.23.96.107.zip
https://cb-jira.s3.us-east-2.amazonaws.com/logs/reconnect_error/collectinfo-2019-11-20T060144-ns_1%40172.23.96.108.zip
In older builds like 6.5.0-4647, we don't used to see this often.
I'm not quite sure what's going on. I see the setupNetConfig REST request get returned with a 200:
172.23.96.100 - Administrator [19/Nov/2019:21:57:08 -0800] "POST /node/controller/enableExternalListener HTTP/1.1" 200 0 - couchbase-cli 6.5.0-4837
172.23.96.100 - Administrator [19/Nov/2019:21:57:09 -0800] "POST /node/controller/setupNetConfig HTTP/1.1" 200 0 - couchbase-cli 6.5.0-4837
Inside ns_server we see:
[ns_server:info,2019-11-19T21:57:09.081-08:00,ns_1@172.23.96.100:netconfig_updater<0.202.0>:netconfig_updater:apply_config_unprotected:158]Node is going to apply the following settings: [{nodeEncryption,true}]
[ns_server:debug,2019-11-19T21:57:09.093-08:00,ns_1@172.23.96.100:cb_dist<0.2356.0>:cb_dist:info_msg:752]cb_dist: Updated cb_dist config "/opt/couchbase/var/lib/couchbase/config/dist_cfg": [{external_listeners,
[inet_tcp_dist,
inet_tls_dist,
inet6_tcp_dist]},
{preferred_external_proto,
inet_tls_dist},
{preferred_local_proto,
inet_tcp_dist}]
.100 plans to reconnect to all the other nodes in the cluster
ns_server:info,2019-11-19T21:57:09.094-08:00,ns_1@172.23.96.100:netconfig_updater<0.202.0>:netconfig_updater:change_ext_dist_proto:241]Reconnecting to all known erl nodes since dist protocol settings changed, expected afamily is inet, expected encryption is true, nodes: ['ns_1@172.23.96.101', 'ns_1@172.23.96.102', 'ns_1@172.23.96.103', 'ns_1@172.23.96.104', 'ns_1@172.23.96.105', 'ns_1@172.23.96.106', 'ns_1@172.23.96.107', 'ns_1@172.23.96.108']
...
[ns_server:debug,2019-11-19T21:57:09.404-08:00,ns_1@172.23.96.100:netconfig_updater<0.202.0>:netconfig_updater:ensure_connection_proto:254]Reconnected to 'ns_1@172.23.96.106', checking connection type...
And the change is successfully applied:
[ns_server:info,2019-11-19T21:57:09.526-08:00,ns_1@172.23.96.100:netconfig_updater<0.202.0>:netconfig_updater:apply_config_unprotected:187]Node network settings ([{nodeEncryption,true}]) successfully applied
Mahesh Mandhare: I'm guessing you're sure these are the correct logs, right?
Assigning to Timofey Barmin for a closer look.