Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: 7.1.0
Affects Version/s: 7.1.0
Component/s: ns_server
Labels:

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
http://supportal.couchbase.com/snapshot/6863003a4db70cc2d65373ed61c5417e::1

https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.100.13.zip
https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.100.14.zip
https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.100.15.zip
https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.155.zip
https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.211.zip
https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.212.zip
https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.213.zip
https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.244.zip
https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.245.zip

Show
http://supportal.couchbase.com/snapshot/6863003a4db70cc2d65373ed61c5417e::1 https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.100.13.zip https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.100.14.zip https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.100.15.zip https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.155.zip https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.211.zip https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.212.zip https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.213.zip https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.244.zip https://cb-engineering.s3.amazonaws.com/node_failed_over_after_started/collectinfo-2021-12-01T072050-ns_1%40172.23.105.245.zip
Story Points:
1
Is this a Regression?:
Unknown

Description

Build: 7.1.0-1787

Scenario:

7 node cluster
Couchbase bucket with replicas=2
Set auto-failover with max_events=10 and timeout=10
Stop couchbase service on all index nodes (172.23.105.245[index+query], 172.23.100.15[index], 172.23.100.13[index+backup])

Failover was attempted but not done with reason

Number of remaining nodes that are running index service is 0. You need at least 1 nodes.

Bring back all 3 nodes back by starting the couchbase-service

Observation:

Master node (.155) saw the node was up but failed to acquire lease from the node(.245) which has resulted in the failover procedure.

ns_server.info.log of 172.23.105.155:

[user:info,2021-11-30T23:18:35.675-08:00,ns_1@172.23.105.155:ns_node_disco<0.435.0>:ns_node_disco:handle_info:177]Node 'ns_1@172.23.105.155' saw that node 'ns_1@172.23.105.245' came up. Tags: []

[ns_server:info,2021-11-30T23:18:35.677-08:00,ns_1@172.23.105.155:ns_node_disco_events<0.434.0>:ns_node_disco_log:handle_event:40]ns_node_disco_log: nodes changed: ['ns_1@172.23.100.13','ns_1@172.23.100.14',

                                   'ns_1@172.23.100.15','ns_1@172.23.105.155',

                                   'ns_1@172.23.105.211',

                                   'ns_1@172.23.105.212',

                                   'ns_1@172.23.105.213',

                                   'ns_1@172.23.105.244',

                                   'ns_1@172.23.105.245']

ns_server:warn,2021-11-30T23:18:35.678-08:00,ns_1@172.23.105.155:<0.14497.249>:leader_lease_acquire_worker:handle_exception:244]Failed to acquire lease from 'ns_1@172.23.105.245': {exit,

                                                     {noproc,

                                                      {gen_server,call,

                                                       [{leader_lease_agent,

                                                         'ns_1@172.23.105.245'},

                                                        {acquire_lease,

                                                         'ns_1@172.23.105.155',

                                                         <<"1486f6a8ae4cd211d1a68767e369fae7">>,

                                                         [{timeout,15000},

                                                          {period,15000}]},

                                                        infinity]}}}

[user:info,2021-11-30T23:18:36.534-08:00,ns_1@172.23.105.155:<0.13944.249>:failover:orchestrate:150]Starting failing over ['ns_1@172.23.105.245']

[user:info,2021-11-30T23:18:36.535-08:00,ns_1@172.23.105.155:<0.11668.0>:ns_orchestrator:handle_start_failover:1658]Starting failover of nodes ['ns_1@172.23.105.245']. Operation Id = e98a8193622d9ae1c8f83c644603b396

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Ashwin Govindarajulu

Reporter:: Ashwin Govindarajulu

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 01/Dec/21 2:37 AM

Updated:: 10/Dec/21 12:34 AM

Resolved:: 06/Dec/21 4:31 PM

Gerrit Reviews

There are no open Gerrit changes

MultiNodeFailover: Failover triggered in the node immediately after restarting the service after the actual failover timeout period due to the reason "failed to acquire lease"

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty