Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
1.8.0
-
Security Level: Public
-
Untriaged
-
Release Note
Description
Logs attached. The reported problem was that after a power failure, one node of a 2-node Couchbase cluster returned with a reset configuration.
In the Couchbase logs, right after the restart, we are unable to listen on the IP address we think we should be listening on:
ERROR REPORT <0.57.0> 2012-06-18 09:12:01
===============================================================================
Got error:eaddrnotavail. Cannot listen on configured address:192.168.1.8
I see in the /var/log/messages that the DHCP client got the address 3 seconds after we tried to listen on it:
Jun 18 09:12:03 cheetah dhclient: bound to 192.168.1.8 – renewal in 807476524 seconds.
And then because we don't know who we are:
INFO REPORT <6044.171.0> 2012-06-18 09:12:02
===============================================================================
ns_1@127.0.0.1:<6044.171.0>:ns_node_disco:189: We've been shunned (nodes_wanted = ['ns_1@192.168.1.71',
'ns_1@192.168.1.8']). Leaving cluster.
INFO REPORT <6044.66.0> 2012-06-18 09:12:02
===============================================================================
ns_log: logging ns_cluster:1:Node 'ns_1@127.0.0.1' is leaving cluster.
Then we spiral around a bunch, seemingly resetting the configuration a number of times (not sure what that's all about, seems like spamming the logs for a few minutes). We settle into a single node cluster, then magically reboot:
INFO REPORT <0.54.0> 2012-06-18 09:19:27
===============================================================================
nonode@nohost:<0.54.0>:log_os_info:25: OS type:
{unix,linux}Version:
{2,6,32}Runtime info: [
{otp_release,"R14B03"},
Try to listen on the correct address again:
INFO REPORT <0.57.0> 2012-06-18 09:19:27
===============================================================================
nonode@nohost:<0.57.0>:dist_manager:105: Attempting to bring up net_kernel with name 'ns_1@192.168.1.8'
But it's too late, we've already been kicked out of the cluster and reset the config.
--------------------------------------------------------------------------------------------------------------------------------
Adding the ns_server component since I think it could definitely handle this case much better, and probably retry to listen on the correct IP address a few times (more than 1) before wiping the config
Adding the linux_installer component since it would probably be a best practice to configure Couchbase as one of the very last services that starts up to ensure the rest of the system is ready when we come up.
Attachments
For Gerrit Dashboard: MB-5598 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
17402,2 | MB-5598 Don't start the server if configured ip address is wrong. | master | ns_server | Status: MERGED | +2 | +1 |