Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5598

Couchbase Server started too early in boot sequence...IP address wasn't yet ready

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.8.0
    • Fix Version/s: bug-backlog
    • Component/s: installer
    • Security Level: Public
    • Labels:
    • Triage:
      Untriaged
    • Flagged:
      Release Note

      Description

      Logs attached. The reported problem was that after a power failure, one node of a 2-node Couchbase cluster returned with a reset configuration.

      In the Couchbase logs, right after the restart, we are unable to listen on the IP address we think we should be listening on:
      ERROR REPORT <0.57.0> 2012-06-18 09:12:01
      ===============================================================================

      Got error:eaddrnotavail. Cannot listen on configured address:192.168.1.8

      I see in the /var/log/messages that the DHCP client got the address 3 seconds after we tried to listen on it:
      Jun 18 09:12:03 cheetah dhclient: bound to 192.168.1.8 – renewal in 807476524 seconds.

      And then because we don't know who we are:

      INFO REPORT <6044.171.0> 2012-06-18 09:12:02
      ===============================================================================

      ns_1@127.0.0.1:<6044.171.0>:ns_node_disco:189: We've been shunned (nodes_wanted = ['ns_1@192.168.1.71',
      'ns_1@192.168.1.8']). Leaving cluster.

      INFO REPORT <6044.66.0> 2012-06-18 09:12:02
      ===============================================================================

      ns_log: logging ns_cluster:1:Node 'ns_1@127.0.0.1' is leaving cluster.

      Then we spiral around a bunch, seemingly resetting the configuration a number of times (not sure what that's all about, seems like spamming the logs for a few minutes). We settle into a single node cluster, then magically reboot:

      INFO REPORT <0.54.0> 2012-06-18 09:19:27
      ===============================================================================

      nonode@nohost:<0.54.0>:log_os_info:25: OS type:

      {unix,linux}

      Version:

      {2,6,32}

      Runtime info: [

      {otp_release,"R14B03"}

      ,

      Try to listen on the correct address again:

      INFO REPORT <0.57.0> 2012-06-18 09:19:27
      ===============================================================================

      nonode@nohost:<0.57.0>:dist_manager:105: Attempting to bring up net_kernel with name 'ns_1@192.168.1.8'

      But it's too late, we've already been kicked out of the cluster and reset the config.
      --------------------------------------------------------------------------------------------------------------------------------

      Adding the ns_server component since I think it could definitely handle this case much better, and probably retry to listen on the correct IP address a few times (more than 1) before wiping the config

      Adding the linux_installer component since it would probably be a best practice to configure Couchbase as one of the very last services that starts up to ensure the rest of the system is ready when we come up.

        Activity

        Hide
        perry Perry Krug added a comment -

        Reopening since in this case, the IP address actually was correct...the system just wasn't ready to have it be listened on. As per my initial comment, can the Linux installer configure Couchbase to start up farther down the list of services so that it gives the system more time to bring up the more necessary services?

        Show
        perry Perry Krug added a comment - Reopening since in this case, the IP address actually was correct...the system just wasn't ready to have it be listened on. As per my initial comment, can the Linux installer configure Couchbase to start up farther down the list of services so that it gives the system more time to bring up the more necessary services?
        Hide
        peter peter added a comment -

        Perry, what was the distribution that had this problem, so we can address this in the init script.

        Show
        peter peter added a comment - Perry, what was the distribution that had this problem, so we can address this in the init script.
        Hide
        perry Perry Krug added a comment -

        This was CentOS I believe, the logs have all of the relevant information about the specific system.

        Show
        perry Perry Krug added a comment - This was CentOS I believe, the logs have all of the relevant information about the specific system.
        Hide
        maria Maria McDuff (Inactive) added a comment -

        per bug triage, downgrading to major.

        Show
        maria Maria McDuff (Inactive) added a comment - per bug triage, downgrading to major.
        Hide
        wayne Wayne Siu added a comment -

        Triaged in Maintenance Meeting on Dec 16, 2014.
        Closing the ticket as there are no activities on this ticket for over an year.

        Show
        wayne Wayne Siu added a comment - Triaged in Maintenance Meeting on Dec 16, 2014. Closing the ticket as there are no activities on this ticket for over an year.

          People

          • Assignee:
            steve Steve Yen
            Reporter:
            perry Perry Krug
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes