Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-8039

failover is not quick when any node (including being failed over) is not responding

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0, 2.0.1
    • Fix Version/s: 2.5.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
    • Is this a Regression?:
      Yes
    • Sprint:
      02/Sep/2013 - 20/Sep/2013

      Description

      SUBJ.

      This happens because janitor_agent can be stuck waiting for:

      *) tap connections "ping" (which we do in order to discover and clean up dead connections)

      *) stuck vbucket filter change request (which is sent to "other" side, i.e. non-local memcached)

      And corresponding ebucketmigrator can be stuck there too.

      So unresponsiveness of 1 node can cause this critical component of all other nodes to be stuck. We cannot activate any vbuckets without stopping replication into them. And that requires:

      *) janitor agent not be stuck

      *) corresponding ebucketmigrators not being stuck

      I've re-visited this problem just now and ideally fix will be made with support from ep-engine side which could be done as part of UPR work.

      Without ep-engine support that will require significant changes in ns_server which are harder to do right now particularly due to 1.8.x backwards compatibility support. That would be doable but would take at least several days of work.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - - edited

        added Perry to watchers who asked to find ticket for this issue

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - - edited added Perry to watchers who asked to find ticket for this issue
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        I suspect people are confusing it with another ticket namely: http://www.couchbase.com/issues/browse/MB-5622.

        We're fixing this one, but MB-5622 is due to quite deep issue in our "master-ful" orchestration approach and will take longer to fix.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - I suspect people are confusing it with another ticket namely: http://www.couchbase.com/issues/browse/MB-5622 . We're fixing this one, but MB-5622 is due to quite deep issue in our "master-ful" orchestration approach and will take longer to fix.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Merged. Backportable for 2.2.1 if there's interest

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Merged. Backportable for 2.2.1 if there's interest

          People

          • Assignee:
            andreibaranouski Andrei Baranouski
            Reporter:
            alkondratenko Aleksey Kondratenko (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Agile

                Gerrit Reviews

                There are no open Gerrit changes