Details
-
Bug
-
Resolution: Not a Bug
-
Critical
-
6.5.0
-
6.5.0-4744-enterprise
-
Untriaged
-
Centos 64-bit
-
Unknown
-
KV-Engine Mad-Hatter GA
Description
Script to repro
./testrunner -i centos_x64--01_01--autofailover_upr.ini -p get-cbcollect-info=True,get-logs=True,stop-on-failure=False,get-coredumps=False,force_kill_memached=True,verify_unacked_bytes=True,GROUP=ALL,num_items=100000 -t failover.MultiNodeAutoFailoverTests.MultiNodeAutoFailoverTests.test_autofailover_and_addback_of_node,timeout=5,num_node_failures=2,recovery_strategy=delta,failover_action=restart_machine,maxCount=2,replicas=2,nodes_init=5,GROUP=restart_machine
|
Steps to Repro
1) Create a 5 node cluster and a bucket with 2 replicas
2) Enable multi node autofailover upto 2 nodes.
3) Reboot a node and wait for auto failover to kick in.
[2019-11-04 05:34:33,928] - [remote_util:3116] INFO - running command.raw on 172.23.104.171: /sbin/reboot
|
[2019-11-04 05:34:43,321] - [task:5011] INFO - Autofailover of node 172.23.104.171 successfully initiated in 8.11329293251 sec
|
4) Reboot another node and wait for auto failover to kick in.
[2019-11-04 05:36:50,944] - [remote_util:3116] INFO - running command.raw on 172.23.104.181: /sbin/reboot
|
[2019-11-04 05:36:59,008] - [task:5011] INFO - Autofailover of node 172.23.104.181 successfully initiated in 5.9205429554 sec
|
5) Do a delta recovery and add back 2 nodes
[2019-11-04 05:39:34,705] - [rest_client:1438] INFO - add_back_node ns_1@172.23.104.171 successful
|
[2019-11-04 05:39:34,705] - [rest_client:1412] INFO - Going to set recoveryType=delta for node :: ns_1@172.23.104.171
|
[2019-11-04 05:39:34,719] - [rest_client:1424] INFO - recoveryType for node ns_1@172.23.104.171 set successful
|
[2019-11-04 05:39:34,730] - [rest_client:1438] INFO - add_back_node ns_1@172.23.104.181 successful
|
[2019-11-04 05:39:34,730] - [rest_client:1412] INFO - Going to set recoveryType=delta for node :: ns_1@172.23.104.181
|
[2019-11-04 05:39:34,739] - [rest_client:1424] INFO - recoveryType for node ns_1@172.23.104.181 set successful
|
6)Rebalance the cluster.
[2019-11-04 05:39:34,739] - [rest_client:1460] INFO - rebalance params : {'password': 'password', 'ejectedNodes': '', 'user': 'Administrator', 'knownNodes': u'ns_1@172.23.104.171,ns_1@172.23.104.227,ns_1@172.23.104.181,ns_1@172.23.104.215,ns_1@172.23.104.130'}
|
[2019-11-04 05:39:34,750] - [rest_client:1465] INFO - rebalance operation started
|
[2019-11-04 05:39:34,755] - [rest_client:1613] INFO - rebalance percentage : 0.00 %
|
[2019-11-04 05:39:44,774] - [rest_client:1596] ERROR - {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed
|
[2019-11-04 05:39:44,792] - [rest_client:3137] INFO - Latest logs from UI on 172.23.104.130:
|
[2019-11-04 05:39:44,793] - [rest_client:3138] ERROR - {u'node': u'ns_1@172.23.104.130', u'code': 0, u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.10422.1>,\n {{wait_seqno_persisted_failed,"default",66,\n 871,\n [{\'ns_1@172.23.104.130\',\n {\'EXIT\',\n {noproc,\n {gen_server,call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.104.130\'},\n {if_rebalance,<0.7858.1>,\n {wait_seqno_persisted,66,871}},\n infinity]}}}}]},\n [{ns_single_vbucket_mover,\n \'-wait_seqno_persisted_many/5-fun-2-\',5,\n [{file,"src/ns_single_vbucket_mover.erl"},\n {line,488}]},\n {proc_lib,init_p,3,\n [{file,"proc_lib.erl"},{line,232}]}]}}}}.\nRebalance Operation Id = 703e7328a44e84802e38a71207b22600', u'shortText': u'message', u'serverTime': u'2019-11-04T05:39:36.535Z', u'module': u'ns_orchestrator', u'tstamp': 1572874776535, u'type': u'critical'}
|
[2019-11-04 05:39:44,793] - [rest_client:3138] ERROR - {u'node': u'ns_1@172.23.104.130', u'code': 0, u'text': u'Worker <0.9524.1> (for action {move,{66,\n [\'ns_1@172.23.104.215\',\n \'ns_1@172.23.104.130\',\n \'ns_1@172.23.104.181\'],\n [\'ns_1@172.23.104.181\',\n \'ns_1@172.23.104.215\',\n \'ns_1@172.23.104.130\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.10422.1>,\n {{wait_seqno_persisted_failed,\n "default",\n 66,871,\n [{\'ns_1@172.23.104.130\',\n {\'EXIT\',\n {noproc,\n {gen_server,\n call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.104.130\'},\n {if_rebalance,\n <0.7858.1>,\n {wait_seqno_persisted,\n 66,\n 871}},\n infinity]}}}}]},\n [{ns_single_vbucket_mover,\n \'-wait_seqno_persisted_many/5-fun-2-\',\n 5,\n [{file,\n "src/ns_single_vbucket_mover.erl"},\n {line,\n 488}]},\n {proc_lib,\n init_p,3,\n [{file,\n "proc_lib.erl"},\n {line,\n 232}]}]}}}', u'shortText': u'message', u'serverTime': u'2019-11-04T05:39:36.525Z', u'module': u'ns_vbucket_mover', u'tstamp': 1572874776525, u'type': u'critical'}
|
Rebalance fails as shown above.
cbcollect_info attached.