Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40934

Rebalance failed with reason "mover crashed - bulk_set_vbucket_state_failed"

    XMLWordPrintable

Details

    Description

       

      Build: 7.0.0-2792

      Scenario:

      1. Two node cluster (172.23.123.117, 172.23.123.111)
      2. Couchbase bucket (replicas=1)
      3. Perform swap rebalance for 1 node (172.23.123.111 <–> 172.23.123.116)

      TAF test case

      rebalance_new.swaprebalancetests.SwapRebalanceBasicTests:
          do_test,nodes_init=2,replicas=1,standard_buckets=1,num-swap=1,num_items=100000,doc_size=256,durability=MAJORITY,active_resident_threshold=70,sdk_timeout=50
      

      Rebalance failure logs:

       

      Starting rebalance, KeepNodes = ['ns_1@172.23.123.117','ns_1@172.23.123.116'], EjectNodes = ['ns_1@172.23.123.111'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = bf11db2be1cc00a3082e79e66a3d8fab
      ..
      ..
      Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.9699.0>, {{bulk_set_vbucket_state_failed,
      [{'ns_1@172.23.123.111', {'EXIT',
      {{{{{child_interrupted, {'EXIT',<14550.5306.0>, socket_closed}},
      [{dcp_replicator,spawn_and_wait,1, [{file,"src/dcp_replicator.erl"}, {line,266}]},
      {dcp_replicator,handle_call,3, [{file,"src/dcp_replicator.erl"}, {line,121}]},
      {gen_server,try_handle_call,4, [{file,"gen_server.erl"}, {line,636}]},
      {gen_server,handle_msg,6, [{file,"gen_server.erl"}, {line,665}]},
      {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"}, {line,247}]}]},
      {gen_server,call, [<14550.5304.0>,
      {setup_replication,
      [512,513,514,515,516,517,518,519, 520,521,522,523,524,525,526,527, 528,529,530,531,532,533,534,535,
      536,537,538,539,540,541,542,543, 544,545,546,547,548,549,550,551, 552,553,554,555,556,557,558,559,
      560,561,562,563,564,565,566,567, 568,569,570,571,572,573,574,575, 576,577,578,579,580,581,582,583,
      584,585,586,587,588,589,590,591, 592,593,594,595,596,597,598,599, 600,601,602,603,604,605,606,607,
      608,609,610,611,612,613,614,615, 616,617,618,619,620,621,622,623, 624,625,626,627,628,629,630,631,
      632,633,634,635,636,637,638,639, 640,641,642,643,644,645,646,647, 648,649,650,651,652,653,654,655,
      656,657,658,659,660,661,662,663, 664,665,666,667,668,669,670,671, 672,673,674,675,676,677,678,679,
      680,681,682,683,684,685,686,687, 688,689,690,691,692,693,694,695, 696,697,698,699,700,701,702,703,
      704,705,706,707,708,709,710,711, 712,713,714,715,716,717,718,719, 720,721,722,723,724,725,726,727,
      728,729,730,731,732,733,734,735, 736,737,738,739,740,741,742,743, 744,745,746,747,748,749,750,751,
      752,753,754,755,756,757,758,759, 760,761,762,763,764,765,766,767, 768,769,770,771,772,773,774,775,
      776,777,778,779,780,781,782,783, 784,785,786,787,788,789,790,791, 792,793,794,795,796,797,798,799,
      800,801,802,803,804,805,806,807, 808,809,810,811,812,813,814,815, 816,817,818,819,820,821,822,823,
      824,825,826,827,828,829,830,831, 832,833,834,835,836,837,838,839, 840,841,842,843,844,845,846,847,
      848,849,850,851,852,853,854,855, 856,857,858,859,860,861,862,863, 864,865,866,867,868,869,870,871,
      872,873,874,875,876,877,878,879, 880,881,882,883,884,885,886,887, 888,889,890,891,892,893,894,895,
      896,897,898,899,900,901,902,903, 904,905,906,907,908,909,910,911, 912,913,914,915,916,917,918,919,
      920,921,922,923,924,925,926,927, 928,929,930,931,932,933,934,935, 936,937,938,939,940,941,942,943,
      944,945,946,947,948,949,950,951, 952,953,954,955,956,957,958,959, 960,961,962,963,964,965,966,967,
      968,969,970,971,972,973,974,975, 976,977,978,979,980,981,982,983, 984,985,986,987,988,989,990,991,
      992,993,994,995,996,997,998,999, 1000,1001,1002,1003,1004,1005, 1006,1007,1008,1009,1010,1011,
      1012,1013,1014]},
      infinity]}},
      {gen_server,call,
      ['replication_manager-default',
      {change_vbucket_replication,1017, undefined}, infinity]}},
      {gen_server,call,
      [{'janitor_agent-default', 'ns_1@172.23.123.111'},
      {if_rebalance,<0.8861.0>,
      {update_vbucket_state,1014,replica, undefined,undefined}}, infinity]}}}}]},
      [{janitor_agent,bulk_set_vbucket_state,4, [{file,"src/janitor_agent.erl"}, {line,403}]},
      {ns_single_vbucket_mover,
      update_replication_post_move,5, [{file,"src/ns_single_vbucket_mover.erl"}, {line,530}]},
      {ns_single_vbucket_mover,on_move_done_body, 6, [{file,"src/ns_single_vbucket_mover.erl"}, {line,556}]},
      {proc_lib,init_p,3, [{file,"proc_lib.erl"},{line,232}]}]}}}}.
      Rebalance Operation Id = bf11db2be1cc00a3082e79e66a3d8fab
       
      Worker <0.9518.0> (for action {move,{1014,
      ['ns_1@172.23.123.117',
      'ns_1@172.23.123.111'],
      ['ns_1@172.23.123.117',
      'ns_1@172.23.123.116'],
      []}}) exited with reason {unexpected_exit,
      {'EXIT', <0.9699.0>,
      {{bulk_set_vbucket_state_failed,
      [{'ns_1@172.23.123.111', {'EXIT',
      {{{{{child_interrupted, {'EXIT',
      <14550.5306.0>, socket_closed}},
      [{dcp_replicator,
      spawn_and_wait, 1, [{file, "src/dcp_replicator.erl"}, {line, 266}]},
      {dcp_replicator, handle_call, 3, [{file, "src/dcp_replicator.erl"}, {line, 121}]},
      {gen_server, try_handle_call, 4, [{file, "gen_server.erl"}, {line, 636}]},
      {gen_server, handle_msg, 6, [{file, "gen_server.erl"}, {line, 665}]},
      {proc_lib, init_p_do_apply, 3, [{file, "proc_lib.erl"}, {line, 247}]}]},
      {gen_server, call, [<14550.5304.0>,
      {setup_replication,
      [512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523,
      524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542,
      543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561,
      562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580,
      581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599,
      600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618,
      619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637,
      638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656,
      657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675,
      676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694,
      695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713,
      714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732,
      733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751,
      752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770,
      771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789,
      790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808,
      809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827,
      828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846,
      847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865,
      866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884,
      885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903,
      904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922,
      923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941,
      942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960,
      961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979,
      980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998,
      999, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014]},
      infinity]}},
      {gen_server, call,
      ['replication_manager-default', {change_vbucket_replication, 1017, undefined}, infinity]}},
      {gen_server, call,
      [{'janitor_agent-default', 'ns_1@172.23.123.111'},
      {if_rebalance, <0.8861.0>,
      {update_vbucket_state, 1014, replica, undefined, undefined}}, infinity]}}}}]},
      [{janitor_agent, bulk_set_vbucket_state, 4, [{file, "src/janitor_agent.erl"}, {line, 403}]},
      {ns_single_vbucket_mover, update_replication_post_move, 5, [{file, "src/ns_single_vbucket_mover.erl"}, {line, 530}]},
      {ns_single_vbucket_mover, on_move_done_body, 6, [{file, "src/ns_single_vbucket_mover.erl"}, {line, 556}]},
      {proc_lib, init_p,3, [{file, "proc_lib.erl"}, {line, 232}]}]}}}

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Marking resolved as fix is now in Cheshire-cat

            richard.demellow Richard deMellow added a comment - Marking resolved as fix is now in Cheshire-cat

            Still seeing failure during transaction+swap_rebalance case.

            Build: 7.0.0-3814.

            Test logs: http://qa.sc.couchbase.com/job/test_suite_executor-TAF/73835/consoleText

            cbcollect: http://qa.sc.couchbase.com/job/test_suite_executor-TAF/73835/artifact/job_logs/testrunner-20-Nov-23_01-54-52/test_1/

            Rebalance exited with reason {mover_crashed,
            {unexpected_exit, {'EXIT',<0.20372.0>,
            {{{{{child_interrupted, {'EXIT',<0.5238.0>,socket_closed}},
            [
            {dcp_replicator,spawn_and_wait,1, [{file,"src/dcp_replicator.erl"},{line,265}]},
            {dcp_replicator,handle_call,3, [{file,"src/dcp_replicator.erl"}, {line,121}]},
            {gen_server,try_handle_call,4, [{file,"gen_server.erl"},{line,661}]},
            {gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,690}]},
            {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,249}]}]},
            {gen_server,call, [<0.5237.0>, {setup_replication,[342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419]}, infinity]}},
            {gen_server,call,['replication_manager-couchstore-1',{change_vbucket_replication,420,undefined},infinity]}},
            {gen_server,call,[{'janitor_agent-couchstore-1','ns_1@172.23.121.213'},
            {if_rebalance,<0.15261.0>, {update_vbucket_state,740,active,paused,undefined,
            [['ns_1@172.23.121.213','ns_1@172.23.121.215']]}},infinity]}}}}}.
            Rebalance Operation Id = 7d3fc9374c82d5793e4c704198b9058b}
             
            Worker <0.20290.0> (for action {move,{740,['ns_1@172.23.121.213','ns_1@172.23.121.215'],['ns_1@172.23.121.160','ns_1@172.23.121.215'],[]}})
            exited with reason {unexpected_exit,{'EXIT',<0.20372.0>,
            {{{{{child_interrupted,{'EXIT',<0.5238.0>,socket_closed}},
            [
            {dcp_replicator,spawn_and_wait,1,[{file,"src/dcp_replicator.erl"},{line,265}]},
            {dcp_replicator,handle_call,3,[{file,"src/dcp_replicator.erl"},{line,121}]},
            {gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,661}]},
            {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,690}]},
            {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},
            {gen_server,call,[<0.5237.0>,{setup_replication,[342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419]},infinity]}},
            {gen_server,call,['replication_manager-couchstore-1',{change_vbucket_replication,420,undefined},infinity]}},
            {gen_server,call,[{'janitor_agent-couchstore-1','ns_1@172.23.121.213'},{if_rebalance,<0.15261.0>,{update_vbucket_state,740,active,paused,undefined,[['ns_1@172.23.121.213','ns_1@172.23.121.215']]}},infinity]}}}}'}

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Still seeing failure during transaction+swap_rebalance case. Build: 7.0.0-3814. Test logs : http://qa.sc.couchbase.com/job/test_suite_executor-TAF/73835/consoleText cbcollect : http://qa.sc.couchbase.com/job/test_suite_executor-TAF/73835/artifact/job_logs/testrunner-20-Nov-23_01-54-52/test_1/ Rebalance exited with reason {mover_crashed, {unexpected_exit, {'EXIT',<0.20372.0>, {{{{{child_interrupted, {'EXIT',<0.5238.0>,socket_closed}}, [ {dcp_replicator,spawn_and_wait,1, [{file,"src/dcp_replicator.erl"},{line,265}]}, {dcp_replicator,handle_call,3, [{file,"src/dcp_replicator.erl"}, {line,121}]}, {gen_server,try_handle_call,4, [{file,"gen_server.erl"},{line,661}]}, {gen_server,handle_msg,6, [{file,"gen_server.erl"},{line,690}]}, {proc_lib,init_p_do_apply,3, [{file,"proc_lib.erl"},{line,249}]}]}, {gen_server,call, [<0.5237.0>, {setup_replication,[342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419]}, infinity]}}, {gen_server,call,['replication_manager-couchstore-1',{change_vbucket_replication,420,undefined},infinity]}}, {gen_server,call,[{'janitor_agent-couchstore-1','ns_1@172.23.121.213'}, {if_rebalance,<0.15261.0>, {update_vbucket_state,740,active,paused,undefined, [['ns_1@172.23.121.213','ns_1@172.23.121.215']]}},infinity]}}}}}. Rebalance Operation Id = 7d3fc9374c82d5793e4c704198b9058b}   Worker <0.20290.0> (for action {move,{740,['ns_1@172.23.121.213','ns_1@172.23.121.215'],['ns_1@172.23.121.160','ns_1@172.23.121.215'],[]}}) exited with reason {unexpected_exit,{'EXIT',<0.20372.0>, {{{{{child_interrupted,{'EXIT',<0.5238.0>,socket_closed}}, [ {dcp_replicator,spawn_and_wait,1,[{file,"src/dcp_replicator.erl"},{line,265}]}, {dcp_replicator,handle_call,3,[{file,"src/dcp_replicator.erl"},{line,121}]}, {gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,661}]}, {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,690}]}, {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}, {gen_server,call,[<0.5237.0>,{setup_replication,[342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419]},infinity]}}, {gen_server,call,['replication_manager-couchstore-1',{change_vbucket_replication,420,undefined},infinity]}}, {gen_server,call,[{'janitor_agent-couchstore-1','ns_1@172.23.121.213'},{if_rebalance,<0.15261.0>,{update_vbucket_state,740,active,paused,undefined,[['ns_1@172.23.121.213','ns_1@172.23.121.215']]}},infinity]}}}}'}
            richard.demellow Richard deMellow added a comment - - edited

            Hi Ashwin Govindarajulu, build 7.0.0-3814 contains a known bug MB-42864 can you re-run it with couchbase-server-7.0.0-3816?

            Just to confirm these set of logs seem to be hitting MB-42864 on node 172.23.121.213:

            2020-11-23T01:58:05.448428-08:00 WARNING 67: (couchstore-1) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.121.212->ns_1@172.23.121.213:couchstore-1 - (vb:421) End stream received with opaque:82 but no such stream for this vBucket
            2020-11-23T01:58:05.448506-08:00 WARNING 67 - Client [ {"ip":"127.0.0.1","port":39995} - {"ip":"127.0.0.1","port":11209} (<ud>@ns_server</ud>) ] not aware of extended error code (stream not found). Disconnecting
            

            richard.demellow Richard deMellow added a comment - - edited Hi Ashwin Govindarajulu , build 7.0.0-3814 contains a known bug MB-42864 can you re-run it with couchbase-server-7.0.0-3816 ? Just to confirm these set of logs seem to be hitting MB-42864 on node 172.23.121.213 : 2020-11-23T01:58:05.448428-08:00 WARNING 67: (couchstore-1) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.121.212->ns_1@172.23.121.213:couchstore-1 - (vb:421) End stream received with opaque:82 but no such stream for this vBucket 2020-11-23T01:58:05.448506-08:00 WARNING 67 - Client [ {"ip":"127.0.0.1","port":39995} - {"ip":"127.0.0.1","port":11209} (<ud>@ns_server</ud>) ] not aware of extended error code (stream not found). Disconnecting

            Marking resolved as its back with QE

            richard.demellow Richard deMellow added a comment - Marking resolved as its back with QE

            Validated the fix on 7.0.0-3874.

            Closing this issue.

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Validated the fix on 7.0.0-3874. Closing this issue.

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty