Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7675

Auto failover took 5 mins complete from the time of failure of a node

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.8.1
    • Fix Version/s: 2.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
    • Environment:
      Linux 64 bit OS

      Description

      Hi,
      Due to power failure, a node has failed. And from the time the node has failed it took 5 mins to Auto-failover. The logs from the "ns_server.info.log" is attached. The log file "Failover" only contains the 5 mins of Auto-Failover information in it. Also the Stats from failed node(10.22.64.9) and one other node(10.22.64.12) is attached for more information about the cluster. If need any other information please let me know. i will glad to provide you.

      Thanks,
      Vijay

      1. Failover.txt
        79 kB
        Neo-matrix
      2. stats.log
        679 kB
        Neo-matrix
      3. stats - Failed node.log
        651 kB
        Neo-matrix
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Log contains too little information.

        But keep in mind that there's known bug about failover taking longer than configured. And the more buckets the longer it takes.

        With full logs from 10.22.64.12 I may be able to say more.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Log contains too little information. But keep in mind that there's known bug about failover taking longer than configured. And the more buckets the longer it takes. With full logs from 10.22.64.12 I may be able to say more.
        Show
        Neo-matrix Vijayaraghavan Mohanasundaram (Inactive) added a comment - Hi - This below link contains the log files for 10.22.64.12. http://s3.amazonaws.com/customers.couchbase.com//Sambreel/cbcollect_info-10.22.64.12-2113.zip Log files for all the nodes are available at " https://s3.amazonaws.com/customers.couchbase.com/Sambreel/cbcollect_info-2113.zip "
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - - edited

        Yes this is known problem. We thought we fixed it in 2.0: MB-4109 but we recently found there are other causes of slow failover even in latest code.

        It is also known that more buckets make problem much worse.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - - edited Yes this is known problem. We thought we fixed it in 2.0: MB-4109 but we recently found there are other causes of slow failover even in latest code. It is also known that more buckets make problem much worse.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        See above

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - See above
        Hide
        Neo-matrix Vijayaraghavan Mohanasundaram (Inactive) added a comment - - edited

        Do we have any workaround to avoid this issue for now?
        Also please confirm if 2.0.1 release fixes this issue?

        Show
        Neo-matrix Vijayaraghavan Mohanasundaram (Inactive) added a comment - - edited Do we have any workaround to avoid this issue for now? Also please confirm if 2.0.1 release fixes this issue?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        There's no fix. But in 1.8.x problem got worse with more buckets. At least in 2.0 it's about 2 minutes independent from bucket count.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - There's no fix. But in 1.8.x problem got worse with more buckets. At least in 2.0 it's about 2 minutes independent from bucket count.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        See above

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - See above
        Hide
        Neo-matrix Vijayaraghavan Mohanasundaram (Inactive) added a comment -

        As there is no fix, Closing.

        Show
        Neo-matrix Vijayaraghavan Mohanasundaram (Inactive) added a comment - As there is no fix, Closing.

          People

          • Assignee:
            Neo-matrix Vijayaraghavan Mohanasundaram (Inactive)
            Reporter:
            Neo-matrix Vijayaraghavan Mohanasundaram (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes