Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
7.1.1
-
Enterprise Edition 7.1.1 build 3135
-
Untriaged
-
Centos 64-bit
-
-
1
-
Unknown
Description
QE TEST
-test tests/integration/neo/test_neo_couchstore_milestone4.yml -scope tests/integration/neo/scope_couchstore.yml
|
Day - 3
Cycle - 3
Scale - 3
STEPS
- Node 104.5 is rebalanced out from the cluster at sometime around 2022-06-13T12:19.
2022-06-13T12:10:54.477-07:00, ns_orchestrator:0:info:message(ns_1@172.23.108.103) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.137','ns_1@172.23.104.155',
'ns_1@172.23.104.157','ns_1@172.23.104.67',
'ns_1@172.23.104.69','ns_1@172.23.104.70',
'ns_1@172.23.105.107','ns_1@172.23.105.111',
'ns_1@172.23.105.168','ns_1@172.23.106.100',
'ns_1@172.23.106.188','ns_1@172.23.108.103',
'ns_1@172.23.120.107','ns_1@172.23.120.245',
'ns_1@172.23.121.117','ns_1@172.23.123.28',
'ns_1@172.23.96.148','ns_1@172.23.96.192',
'ns_1@172.23.96.251','ns_1@172.23.96.252',
'ns_1@172.23.96.253','ns_1@172.23.97.119',
'ns_1@172.23.97.121','ns_1@172.23.97.122',
'ns_1@172.23.97.239','ns_1@172.23.99.11',
'ns_1@172.23.99.25'], EjectNodes = ['ns_1@172.23.104.5'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = e4a957f9e572e5900975c7d0b529cc3d
2022-06-13T12:19:10.929-07:00, ns_cluster:1:info:message(ns_1@172.23.104.5) - Node 'ns_1@172.23.104.5' is leaving cluster.
2022-06-13T12:19:11.395-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.121.117) - Node 'ns_1@172.23.121.117' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
{shutdown,
transport_closed}}]
2022-06-13T12:19:11.396-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.97.121) - Node 'ns_1@172.23.97.121' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
{shutdown,
transport_closed}}]
2022-06-13T12:19:11.396-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.105.168) - Node 'ns_1@172.23.105.168' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
{shutdown,
transport_closed}}]
2022-06-13T12:19:11.396-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.104.70) - Node 'ns_1@172.23.104.70' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
{shutdown,
transport_closed}}]
2022-06-13T12:19:11.397-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.104.67) - Node 'ns_1@172.23.104.67' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
{shutdown,
transport_closed}}]
2022-06-13T12:19:11.397-07:00, ns_node_disco:5:warning:node down(ns_1@172.23.96.192) - Node 'ns_1@172.23.96.192' saw that node 'ns_1@172.23.104.5' went down. Details: [{nodedown_reason,
{shutdown,
transport_closed}}]
2022-06-13T12:19:11.397-07:00, ns_orchestrator:0:info:message(ns_1@172.23.108.103) - Rebalance completed successfully.
2. As soon as the node is rebalanced out from the cluster, it is continuously crashing with lease expired error few seconds after the lease was actually acquired.
For eg -
[ns_server:info,2022-06-13T12:32:13.115-07:00,ns_1@172.23.104.5:<0.10257.3>:leader_lease_acquire_worker:handle_fresh_lease_acquired:296]Acquired lease from node 'ns_1@172.23.104.5' (lease uuid: <<"df490a842356d9b36df06b97c4ecf60e">>) |
[ns_server:warn,2022-06-13T12:32:24.324-07:00,ns_1@172.23.104.5:leader_lease_agent<0.10248.3>:leader_lease_agent:handle_terminate:308]Terminating with reason shutdown when lease is expiring: |
{lease,
|
{lease_holder,<<"df490a842356d9b36df06b97c4ecf60e">>,'ns_1@172.23.104.5'}, |
-576460554683849027,-576460539683849027, |
{timer,undefined,
|
{lease_expired,
|
{lease_holder,<<"df490a842356d9b36df06b97c4ecf60e">>, |
'ns_1@172.23.104.5'}}}, |
expiring}
|
Removing the persisted lease.
|
Pair of log statements mentioned above getting logged continuously from the time when node was rebalanced out from the cluster.
3. Node addition fails when we try to add the node back to the cluster.
2022-06-13T12:40:51.595-07:00, ns_cluster:5:info:message(ns_1@172.23.104.67) - Failed to add node 172.23.104.5:18091 to cluster. Failed to connect to https://172.23.104.5:18091. ok |