Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5598

Couchbase Server started too early in boot sequence...IP address wasn't yet ready

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • bug-backlog
    • 1.8.0
    • installer
    • Security Level: Public
    • Untriaged
    • Release Note

    Description

      Logs attached. The reported problem was that after a power failure, one node of a 2-node Couchbase cluster returned with a reset configuration.

      In the Couchbase logs, right after the restart, we are unable to listen on the IP address we think we should be listening on:
      ERROR REPORT <0.57.0> 2012-06-18 09:12:01
      ===============================================================================

      Got error:eaddrnotavail. Cannot listen on configured address:192.168.1.8

      I see in the /var/log/messages that the DHCP client got the address 3 seconds after we tried to listen on it:
      Jun 18 09:12:03 cheetah dhclient: bound to 192.168.1.8 – renewal in 807476524 seconds.

      And then because we don't know who we are:

      INFO REPORT <6044.171.0> 2012-06-18 09:12:02
      ===============================================================================

      ns_1@127.0.0.1:<6044.171.0>:ns_node_disco:189: We've been shunned (nodes_wanted = ['ns_1@192.168.1.71',
      'ns_1@192.168.1.8']). Leaving cluster.

      INFO REPORT <6044.66.0> 2012-06-18 09:12:02
      ===============================================================================

      ns_log: logging ns_cluster:1:Node 'ns_1@127.0.0.1' is leaving cluster.

      Then we spiral around a bunch, seemingly resetting the configuration a number of times (not sure what that's all about, seems like spamming the logs for a few minutes). We settle into a single node cluster, then magically reboot:

      INFO REPORT <0.54.0> 2012-06-18 09:19:27
      ===============================================================================

      nonode@nohost:<0.54.0>:log_os_info:25: OS type:

      {unix,linux}

      Version:

      {2,6,32}

      Runtime info: [

      {otp_release,"R14B03"}

      ,

      Try to listen on the correct address again:

      INFO REPORT <0.57.0> 2012-06-18 09:19:27
      ===============================================================================

      nonode@nohost:<0.57.0>:dist_manager:105: Attempting to bring up net_kernel with name 'ns_1@192.168.1.8'

      But it's too late, we've already been kicked out of the cluster and reset the config.
      --------------------------------------------------------------------------------------------------------------------------------

      Adding the ns_server component since I think it could definitely handle this case much better, and probably retry to listen on the correct IP address a few times (more than 1) before wiping the config

      Adding the linux_installer component since it would probably be a best practice to configure Couchbase as one of the very last services that starts up to ensure the rest of the system is ready when we come up.

      Attachments

        For Gerrit Dashboard: MB-5598
        # Subject Branch Project Status CR V

        Activity

          People

            steve Steve Yen
            perry Perry Krug
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty