Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
Cheshire-Cat
-
Centos 7 64 bit; Couchbase Enterprise Build 7.0.0-2840
-
Triaged
-
Centos 64-bit
-
-
1
-
No
Description
Script to Repo
./testrunner -i /tmp/testexec.9394.ini GROUP=durability_majority_dgm,rerun=False,upgrade_version=7.0.0-2840 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_swap_rebalance,data_load_stage=during,upgrade_version=7.0.0-2840,rerun=False,GROUP=durability_majority_dgm,nodes_swap=2,bucket_spec=dgm.buckets_for_rebalance_tests,get-cbcollect-info=True,replicas=2,durability=MAJORITY,log_level=error,dgm_test=True,nodes_init=4,override_spec_params=durability;replicas,infra_log_level=critical
Steps To repro: (found on weekly job run)
- Create a 4 node cluster
- Creates bucket and collections
- Initital data load bucket to dgm, with durability majority
- Perform swap rebalance
Rebalance fails with Stack trace:
logs from 172.23.97.92 - master node
2020-08-16 10:17:10,723 | test | ERROR | pool-2-thread-15 | [rest_client:_rebalance_status_and_progress:1478] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'status': u'none'} - rebalance failed
|
2020-08-16 10:17:10,760 | test | ERROR | pool-2-thread-15 | [rest_client:print_UI_logs:2537] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.97.92', u'tstamp': 1597598229033L, u'shortText': u'message', u'serverTime': u'2020-08-16T17:17:09.033Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.19608.2>,\n {{bulk_set_vbucket_state_failed,\n [{\'ns_1@172.23.97.87\',\n {\'EXIT\',\n {{{{{child_interrupted,\n {\'EXIT\',<23387.3360.0>,\n socket_closed}},\n [{dcp_replicator,spawn_and_wait,1,\n [{file,"src/dcp_replicator.erl"},\n {line,266}]},\n {dcp_replicator,handle_call,3,\n [{file,"src/dcp_replicator.erl"},\n {line,121}]},\n {gen_server,try_handle_call,4,\n [{file,"gen_server.erl"},\n {line,636}]},\n {gen_server,handle_msg,6,\n [{file,"gen_server.erl"},\n {line,665}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},\n {line,247}]}]},\n {gen_server,call,\n [<23387.3358.0>,\n {setup_replication,\n [768,769,770,771,772,773,774,775,\n 776,777,778,779,780,781,782,783,\n 784,785,786,805,806,807,808,809,\n 810,811,812,813,814,815,816,817,\n 818,819,820,821,822,823,824,825,\n 826,827,828,829,830,831,832,833,\n 834,835,836,837,838,839,840,841,\n 933,934,935,936,937,938,939,940,\n 941,942,943,944,945,946,947,948,\n 949,950,951,952,953,954,955,956,\n 957,958,959,960,961,962,963,964,\n 965,966,967,968,969,970,971,972,\n 973,974,975,976,977,978,979,980]},\n infinity]}},\n {gen_server,call,\n [\'replication_manager-default\',\n {change_vbucket_replication,787,\n undefined},\n infinity]}},\n {gen_server,call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.97.87\'},\n {if_rebalance,<0.17013.1>,\n {update_vbucket_state,786,replica,\n undefined,undefined}},\n infinity]}}}}]},\n [{janitor_agent,bulk_set_vbucket_state,4,\n [{file,"src/janitor_agent.erl"},\n {line,403}]},\n {ns_single_vbucket_mover,\n update_replication_post_move,5,\n [{file,"src/ns_single_vbucket_mover.erl"},\n {line,530}]},\n {ns_single_vbucket_mover,on_move_done_body,\n 6,\n [{file,"src/ns_single_vbucket_mover.erl"},\n {line,556}]},\n {proc_lib,init_p,3,\n [{file,"proc_lib.erl"},{line,232}]}]}}}}.\nRebalance Operation Id = 77a0d80631eae3c982fec0d585db6165'}
|
2020-08-16 10:17:10,763 | test | ERROR | pool-2-thread-15 | [rest_client:print_UI_logs:2537] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.97.92', u'tstamp': 1597598228985L, u'shortText': u'message', u'serverTime': u'2020-08-16T17:17:08.985Z', u'text': u'Worker <0.18410.2> (for action {move,{786,\n [\'ns_1@172.23.97.92\',\n \'ns_1@172.23.97.87\',\n \'ns_1@172.23.97.89\'],\n [\'ns_1@172.23.97.92\',\n \'ns_1@172.23.97.88\',\n \'ns_1@172.23.97.89\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.19608.2>,\n {{bulk_set_vbucket_state_failed,\n [{\'ns_1@172.23.97.87\',\n {\'EXIT\',\n {{{{{child_interrupted,\n {\'EXIT\',\n <23387.3360.0>,\n socket_closed}},\n [{dcp_replicator,\n spawn_and_wait,\n 1,\n [{file,\n "src/dcp_replicator.erl"},\n {line,\n 266}]},\n {dcp_replicator,\n handle_call,\n 3,\n [{file,\n "src/dcp_replicator.erl"},\n {line,\n 121}]},\n {gen_server,\n try_handle_call,\n 4,\n [{file,\n "gen_server.erl"},\n {line,\n 636}]},\n {gen_server,\n handle_msg,\n 6,\n [{file,\n "gen_server.erl"},\n {line,\n 665}]},\n {proc_lib,\n init_p_do_apply,\n 3,\n [{file,\n "proc_lib.erl"},\n {line,\n 247}]}]},\n {gen_server,\n call,\n [<23387.3358.0>,\n {setup_replication,\n [768,\n 769,\n 770,\n 771,\n 772,\n 773,\n 774,\n 775,\n 776,\n 777,\n 778,\n 779,\n 780,\n 781,\n 782,\n 783,\n 784,\n 785,\n 786,\n 805,\n 806,\n 807,\n 808,\n 809,\n 810,\n 811,\n 812,\n 813,\n 814,\n 815,\n 816,\n 817,\n 818,\n 819,\n 820,\n 821,\n 822,\n 823,\n 824,\n 825,\n 826,\n 827,\n 828,\n 829,\n 830,\n 831,\n 832,\n 833,\n 834,\n 835,\n 836,\n 837,\n 838,\n 839,\n 840,\n 841,\n 933,\n 934,\n 935,\n 936,\n 937,\n 938,\n 939,\n 940,\n 941,\n 942,\n 943,\n 944,\n 945,\n 946,\n 947,\n 948,\n 949,\n 950,\n 951,\n 952,\n 953,\n 954,\n 955,\n 956,\n 957,\n 958,\n 959,\n 960,\n 961,\n 962,\n 963,\n 964,\n 965,\n 966,\n 967,\n 968,\n 969,\n 970,\n 971,\n 972,\n 973,\n 974,\n 975,\n 976,\n 977,\n 978,\n 979,\n 980]},\n infinity]}},\n {gen_server,\n call,\n [\'replication_manager-default\',\n {change_vbucket_replication,\n 787,\n undefined},\n infinity]}},\n {gen_server,\n call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.97.87\'},\n {if_rebalance,\n <0.17013.1>,\n {update_vbucket_state,\n 786,\n replica,\n undefined,\n undefined}},\n infinity]}}}}]},\n [{janitor_agent,\n bulk_set_vbucket_state,\n 4,\n [{file,\n "src/janitor_agent.erl"},\n {line,\n 403}]},\n {ns_single_vbucket_mover,\n update_replication_post_move,\n 5,\n [{file,\n "src/ns_single_vbucket_mover.erl"},\n {line,\n 530}]},\n {ns_single_vbucket_mover,\n on_move_done_body,\n 6,\n [{file,\n "src/ns_single_vbucket_mover.erl"},\n {line,\n 556}]},\n {proc_lib,\n init_p,3,\n [{file,\n "proc_lib.erl"},\n {line,\n 232}]}]}}}'}
|
2020-08-16 10:17:10,769 | test | ERROR | pool-2-thread-15 | [rest_client:print_UI_logs:2537] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'info', u'node': u'ns_1@172.23.97.92', u'tstamp': 1597598206741L, u'shortText': u'message', u'serverTime': u'2020-08-16T17:16:46.741Z', u'text': u'Bucket "default" rebalance appears to be swap rebalance'}
|
2020-08-16 10:17:10,770 | test | ERROR | pool-2-thread-15 | [rest_client:print_UI_logs:2537] {u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.97.90', u'tstamp': 1597598205590L, u'shortText': u'message', u'serverTime': u'2020-08-16T10:16:45.590Z', u'text': u'Bucket "default" loaded on node \'ns_1@172.23.97.90\' in 0 seconds.'}
|
2020-08-16 10:17:10,770 | test | ERROR | pool-2-thread-15 | [rest_client:print_UI_logs:2537] {u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.97.88', u'tstamp': 1597598205581L, u'shortText': u'message', u'serverTime': u'2020-08-16T10:16:45.581Z', u'text': u'Bucket "default" loaded on node \'ns_1@172.23.97.88\' in 0 seconds.'}
|
2020-08-16 10:17:10,772 | test | ERROR | pool-2-thread-15 | [rest_client:print_UI_logs:2537] {u'code': 0, u'module': u'ns_rebalancer', u'type': u'info', u'node': u'ns_1@172.23.97.92', u'tstamp': 1597598205427L, u'shortText': u'message', u'serverTime': u'2020-08-16T17:16:45.427Z', u'text': u'Started rebalancing bucket default'}
|
2020-08-16 10:17:10,772 | test | ERROR | pool-2-thread-15 | [rest_client:print_UI_logs:2537] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'info', u'node': u'ns_1@172.23.97.92', u'tstamp': 1597598205234L, u'shortText': u'message', u'serverTime': u'2020-08-16T17:16:45.234Z', u'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.97.92','ns_1@172.23.97.90',\n 'ns_1@172.23.97.89','ns_1@172.23.97.88'], EjectNodes = ['ns_1@172.23.97.91',\n 'ns_1@172.23.97.87'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 77a0d80631eae3c982fec0d585db6165"}
|
2020-08-16 10:17:10,773 | test | ERROR | pool-2-thread-15 | [rest_client:print_UI_logs:2537] {u'code': 0, u'module': u'memcached_config_mgr', u'type': u'info', u'node': u'ns_1@172.23.97.90', u'tstamp': 1597598205185L, u'shortText': u'message', u'serverTime': u'2020-08-16T10:16:45.185Z', u'text': u'Hot-reloaded memcached.json for config change of the following keys: [<<"scramsha_fallback_salt">>]'}
|
2020-08-16 10:17:10,773 | test | ERROR | pool-2-thread-15 | [rest_client:print_UI_logs:2537] {u'code': 3, u'module': u'ns_cluster', u'type': u'info', u'node': u'ns_1@172.23.97.90', u'tstamp': 1597598205161L, u'shortText': u'message', u'serverTime': u'2020-08-16T10:16:45.161Z', u'text': u'Node ns_1@172.23.97.90 joined cluster'}
|
2020-08-16 10:17:10,775 | test | ERROR | pool-2-thread-15 | [rest_client:print_UI_logs:2537] {u'code': 1, u'module': u'menelaus_sup', u'type': u'info', u'node': u'ns_1@172.23.97.90', u'tstamp': 1597598205101L, u'shortText': u'web start ok', u'serverTime': u'2020-08-16T10:16:45.101Z', u'text': u'Couchbase Server has started on web port 8091 on node \'ns_1@172.23.97.90\'. Version: "7.0.0-2840-enterprise".'}
|
2020-08-16 10:17:10,775 | test | ERROR | pool-2-thread-15 | [task:call:236] Rebalance Failed: {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'status': u'none'} - rebalance failed
|
Attachments
Issue Links
- duplicates
-
MB-40934 Rebalance failed with reason "mover crashed - bulk_set_vbucket_state_failed"
- Closed