Details
-
Bug
-
Resolution: Duplicate
-
Major
-
Cheshire-Cat
-
7.0.0-3016
-
Triaged
-
-
1
-
Yes
Description
Script to repro
./testrunner -i /tmp/testexec.30136.ini GROUP=rebalance_with_collection_crud_durability_MAJORITY,rerun=False,upgrade_version=7.0.0-3016 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_swap_rebalance,data_load_stage=during,quota_percent=80,upgrade_version=7.0.0-3016,rerun=False,GROUP=rebalance_with_collection_crud_durability_MAJORITY,nodes_swap=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,get-cbcollect-info=True,replicas=2,durability=MAJORITY,log_level=error,nodes_init=4,override_spec_params=durability;replicas,infra_log_level=critical
|
Its basically a multi node swap rebalance + collections CRUD + durability level majority which fails as shown below.
Seen on 172.23.105.234
2020-09-07 04:52:24,286 | test | ERROR | pool-1-thread-20 | [rest_client:print_UI_logs:2537] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.105.234', u'tstamp': 1599479543353L, u'shortText': u'message', u'serverTime': u'2020-09-07T04:52:23.353Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.6425.8>,\n {{{{{child_interrupted,\n {\'EXIT\',<25829.4023.0>,socket_closed}},\n [{dcp_replicator,spawn_and_wait,1,\n [{file,"src/dcp_replicator.erl"},\n {line,265}]},\n {dcp_replicator,handle_call,3,\n [{file,"src/dcp_replicator.erl"},\n {line,121}]},\n {gen_server,try_handle_call,4,\n [{file,"gen_server.erl"},{line,661}]},\n {gen_server,handle_msg,6,\n [{file,"gen_server.erl"},{line,690}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},{line,249}]}]},\n {gen_server,call,\n [<25829.4021.0>,\n {setup_replication,\n [383,384,385,386,387,388,389,390,391,\n 392,393,394,395,396,397,398,399,400,\n 401,402,403,404,405,406,407,408,409,\n 410,411,412,413,414,415,416,417,418,\n 419,420,421,422,423,424,425,426,427,\n 428,473,474,475,476,477,478,479]},\n infinity]}},\n {gen_server,call,\n [\'replication_manager-default\',\n {change_vbucket_replication,480,undefined},\n infinity]}},\n {gen_server,call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.105.34\'},\n {if_rebalance,<0.4004.6>,\n {update_vbucket_state,720,active,paused,\n undefined,\n [[\'ns_1@172.23.105.34\',\n \'ns_1@172.23.106.47\',\n \'ns_1@172.23.105.234\']]}},\n infinity]}}}}}.\nRebalance Operation Id = e62254fd737e2d08641aee48c5bb8bfb'}
|
2020-09-07 04:52:24,289 | test | ERROR | pool-1-thread-20 | [rest_client:print_UI_logs:2537] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.105.234', u'tstamp': 1599479543160L, u'shortText': u'message', u'serverTime': u'2020-09-07T04:52:23.160Z', u'text': u'Worker <0.5985.8> (for action {move,{720,\n [\'ns_1@172.23.105.34\',\n \'ns_1@172.23.106.47\',\n \'ns_1@172.23.105.234\'],\n [\'ns_1@172.23.106.48\',\n \'ns_1@172.23.97.219\',\n \'ns_1@172.23.105.234\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.6425.8>,\n {{{{{child_interrupted,\n {\'EXIT\',\n <25829.4023.0>,\n socket_closed}},\n [{dcp_replicator,\n spawn_and_wait,\n 1,\n [{file,\n "src/dcp_replicator.erl"},\n {line,\n 265}]},\n {dcp_replicator,\n handle_call,\n 3,\n [{file,\n "src/dcp_replicator.erl"},\n {line,\n 121}]},\n {gen_server,\n try_handle_call,\n 4,\n [{file,\n "gen_server.erl"},\n {line,\n 661}]},\n {gen_server,\n handle_msg,\n 6,\n [{file,\n "gen_server.erl"},\n {line,\n 690}]},\n {proc_lib,\n init_p_do_apply,\n 3,\n [{file,\n "proc_lib.erl"},\n {line,\n 249}]}]},\n {gen_server,\n call,\n [<25829.4021.0>,\n {setup_replication,\n [383,\n 384,\n 385,\n 386,\n 387,\n 388,\n 389,\n 390,\n 391,\n 392,\n 393,\n 394,\n 395,\n 396,\n 397,\n 398,\n 399,\n 400,\n 401,\n 402,\n 403,\n 404,\n 405,\n 406,\n 407,\n 408,\n 409,\n 410,\n 411,\n 412,\n 413,\n 414,\n 415,\n 416,\n 417,\n 418,\n 419,\n 420,\n 421,\n 422,\n 423,\n 424,\n 425,\n 426,\n 427,\n 428,\n 473,\n 474,\n 475,\n 476,\n 477,\n 478,\n 479]},\n infinity]}},\n {gen_server,\n call,\n [\'replication_manager-default\',\n {change_vbucket_replication,\n 480,\n undefined},\n infinity]}},\n {gen_server,\n call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.105.34\'},\n {if_rebalance,\n <0.4004.6>,\n {update_vbucket_state,\n 720,\n active,\n paused,\n undefined,\n [[\'ns_1@172.23.105.34\',\n \'ns_1@172.23.106.47\',\n \'ns_1@172.23.105.234\']]}},\n infinity]}}}}'}
|
I have not attached detailed steps as the weekly run was done with lower verbose log level and it has only failures pasted above and subsequent repro's did not yield this rebalance failure. However since I am adding a supportal linik, hopefully it should give enough information to do debugging.
This test did pass onĀ 7.0.0-2908. However it does look like its not consistently reproducible.
Attachments
Issue Links
- duplicates
-
MB-40934 Rebalance failed with reason "mover crashed - bulk_set_vbucket_state_failed"
- Closed