Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 6.6.2, 7.0.0
Affects Version/s: 5.5.0
Component/s: ns_server
Labels:
- approved-for-6.6.2
- releasenote

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Unknown

Description

A customer uses a custom automation script to perform node failover based on events. As part of the automation script they use `couchbase-cli` to failover the node which will trigger "unsafe" failover for the node by default (behavior is changed in 6.6 via ~~MB-39220~~).

However, even when the failover is "unsafe" the cluster manager is waiting for quorum for 2000ms before proceeding with the failover. During failover, for each bucket we call the janitor:cleanup where we go through leader_activities and wait for quorum again. This makes the quorum wait time proportional to number of buckets.

And when we specify the failover as "unsafe", is it expected that the cluster manager to wait for quorum? (Especially on a 2 node cluster, if one node is down the other node won't get the quorum).

Aliaksey looked at the logs and said:

It's an interesting corner case that we should probably address. In the meantime, a workaround for them is to run the following via /diag/eval:

ns_config:set({timeout,{leader_activities,unsafe_preconditions_timeout}}, 0).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

test_unsafe_failover.log
16 kB
22/Mar/21 12:02 AM

Issue Links

relates to

MB-50209 Allow remote modification of `unsafe_preconditions_timeout`.

Resolved

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Sumedh Basarkod (Inactive)

Reporter:: Steve Watanabe

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 09/Jul/20 2:33 PM

Updated:: 14/Jan/22 4:31 PM

Resolved:: 11/Feb/21 4:46 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 3 closed Gerrit changes

Hide There are 3 closed Gerrit changes

MB-40375 Don't wait for pre-conditions when unsafe=true: Gerrit Review:

MB-40375 Don't wait for pre-conditions when unsafe=true: Gerrit Review:

Merge remote-tracking branch 'couchbase/mad-hatter': Gerrit Review:

Hard/unsafe failover checks preconditions more than once

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty