Description
Build: 6.5.0-4908 not seen on 4890
Test: MH longevity with durability
Cycle: 2nd
Day: 1st
Test Step:
Autofailover 1 kv node
[2019-11-29T22:02:12-08:00, sequoiatools/couchbase-cli:6.5:74e46b] setting-autofailover -c 172.23.108.103:8091 -u Administrator -p password --enable-auto-failover=1 --auto-failover-timeout=5 --max-failovers=1
|
[2019-11-29T22:02:38-08:00, sequoiatools/cmd:681b7c] 10
|
[2019-11-29T22:03:15-08:00, sequoiatools/cbutil:b46f53] /cbinit.py 172.23.106.100 root couchbase stop
|
[2019-11-29T22:03:56-08:00, sequoiatools/cmd:641481] 10
|
[2019-11-29T22:04:27-08:00, sequoiatools/couchbase-cli:6.5:adf868] rebalance -c 172.23.108.103:8091 -u Administrator -p password
|
→
|
|
|
Error occurred on container - sequoiatools/couchbase-cli:6.5:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
|
|
|
docker logs adf868
|
docker start adf868
|
|
|
*Unable to display progress bar on this os
|
JERROR: Rebalance failed. See logs for detailed reason. You can try again.
|
[2019-11-29T22:05:26-08:00, sequoiatools/cmd:07a801] 60
|
Rebalance failed
[user:error,2019-11-29T22:04:58.567-08:00,ns_1@172.23.108.103:<0.12064.0>:ns_orchestrator:log_rebalance_completion:1445]Rebalance exited with reason {prepare_rebalance_failed,
|
{error,
|
{failed_nodes,
|
[{'ns_1@172.23.106.100',{error,timeout}}]}}}.
|
Rebalance Operation Id = b8a928a76ed5b8a39656c137ca54a1b9
|
Auto-failover failed because some of the vbuckets didn't have replicas:
2019-11-29T22:03:25.623-08:00, auto_failover:0:info:message(ns_1@172.23.108.103) - Could not automatically fail over nodes (['ns_1@172.23.106.100']). Would lose vbuckets in the following buckets: ["ORDER_LINE","ORDERS",
"NEW_ORDER","ITEM","HISTORY",
"DISTRICT","CUSTOMER",
"default"]
Then, even though the node 106.100 was down, a rebalance was initiated:
2019-11-29T22:04:28.563-08:00, ns_orchestrator:0:info:message(ns_1@172.23.108.103) - Starting rebalance, KeepNodes = ['ns_1@172.23.104.155','ns_1@172.23.104.156',
'ns_1@172.23.104.157','ns_1@172.23.104.164',
'ns_1@172.23.104.61','ns_1@172.23.104.69',
'ns_1@172.23.104.87','ns_1@172.23.104.88',
'ns_1@172.23.106.100','ns_1@172.23.106.188',
'ns_1@172.23.108.103','ns_1@172.23.96.148',
'ns_1@172.23.96.251','ns_1@172.23.96.252',
'ns_1@172.23.96.253','ns_1@172.23.96.95',
'ns_1@172.23.97.119','ns_1@172.23.97.121',
'ns_1@172.23.97.122','ns_1@172.23.97.239',
'ns_1@172.23.97.242','ns_1@172.23.98.135',
'ns_1@172.23.99.11','ns_1@172.23.99.21',
'ns_1@172.23.99.25'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = b8a928a76ed5b8a39656c137ca54a1b9
But since node 106.100 was down, it got stuck in preparation stages waiting for the node to respond. Eventually it timed out:
2019-11-29T22:04:58.567-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.108.103) - Rebalance exited with reason {prepare_rebalance_failed,
{error,
{failed_nodes,
[{'ns_1@172.23.106.100',{error,timeout}}]}}}.
In between these two events, we attempted to auto-failover again. Which failed with a different reason because of the running rebalance:
2019-11-29T22:04:28.718-08:00, auto_failover:0:info:message(ns_1@172.23.108.103) - Could not automatically fail over nodes (['ns_1@172.23.106.100']). Rebalance is running.
None of this constitutes a bug.