Details
-
Bug
-
Resolution: User Error
-
Critical
-
4.5.1
-
Untriaged
-
-
No
Description
During Longevity testing 4.5.1-2806
Rebalance out failed due to a node being down, yet the node remains in cluster as Pending. Appears some sort of deadlock has occurred.
First sign of instability were net_tick_timeout
]Node 'ns_1@172.23.108.103' saw that node 'ns_1@172.23.108.105' went do
|
wn. Details: [{nodedown_reason, net_tick_timeout}]
|
|
[ns_server:error,2016-08-11T11:38:51.820-07:00,ns_1@172.23.108.103:<0.27261.58>:ns_single_vbucket_mover:spawn_and_wait:131]Got unexpected exit signal {'EXIT',<0.27688.58>,
|
{{nodedown,'ns_1@172.23.108.105'},
|
{gen_server,call,
|
[{'janitor_agent-default','ns_1@172.23.108.105'},
|
{if_rebalance,<0.10567.58>,
|
{wait_index_updated,227}},
|
infinity]}}}
|
Leading to rebalance failure with
exited with {unexpected_exit,
|
{'EXIT',<0.27873.58>,
|
{wait_seqno_persisted_failed,"default",225,19562,
|
[{'ns_1@172.23.108.105',
|
{'EXIT',
|
{{nodedown,'ns_1@172.23.108.105'},
|
Subsequent attempts to rebalance also fail for same reason. The down node remains in a yellow Pending state. Also the node that was being rebalanced out ('.104') remains in cluster with pending state.
Mcd trace on .105:
https://s3.amazonaws.com/scalability-mcafee/nodedown/mcd_trace.txt
Cluster is still live:
http://172.23.108.103:8091
Attachments
Issue Links
- mentioned in
-
Page Loading...