Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6595

[RN 2.0.1]][longevity] something unknown is causing severe timeouts in ns_server. Particularly under views building and/or compaction. Which causes rebalance to fail and other types of badness.

    Details

    • Flagged:
      Release Note

      Description

      Cluster information:

      • 11 centos 6.2 64bit server with 4 cores CPU
      • Each server has 10 GB RAM and 150 GB disk.
      • 8 GB RAM for couchbase server at each node (80% total system memmories)
      • Disk format ext3 on both data and root
      • Each server has its own drive, no disk sharing with other server.
      • Load 9 million items to both buckets
      • Initial indexing, so cpu a little heavy load
      • Cluster has 2 buckets, default (3GB) and saslbucket (3GB)
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
      • Create cluster with 10 nodes installed couchbase server 2.0.0-1697

      10.3.121.13
      10.3.121.14
      10.3.121.15
      10.3.121.16
      10.3.121.17
      10.3.121.20
      10.3.121.22
      10.3.121.24
      10.3.121.25
      10.3.121.23

      • Data path /data
      • View path /data
      • Do swap rebalance. Add node 26 and remove node 25
      • Rebalance failed as in bug MB-6573
      • Then do rebalance again. Rebalance failed again with error in log page point to node 14

      Rebalance exited with reason {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,

      {eval, #Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default', {change_vbucket_replication,726, undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default', 'ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,
      undefined,undefined}},
      infinity]}}}}]}
      ns_orchestrator002 ns_1@10.3.121.14 01:18:02 - Sat Sep 8, 2012

      <0.19004.2> exited with {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,{eval, #Fun<cluster_compat_mode.0.45438860>}

      ]}},
      {gen_server,call,
      ['tap_replication_manager-default',

      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]} ns_vbucket_mover000 ns_1@10.3.121.14 01:18:01 - Sat Sep 8, 2012
      Server error during processing: ["web request failed",


      * Go to node 14, I see many tap_replication_manager-default crash right before rebalane failed at 01:18:01 - Sat Sep 8, 2012

      [error_logger:error,2012-09-08T1:18:01.836,ns_1@10.3.121.14:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.19330.2>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}}
      in function ns_single_vbucket_mover:mover_inner_old_style/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.16591.2>,<0.22331.1>]
      messages: [{'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'}

      ,
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}]
      links: [<0.16591.2>]
      dictionary: [

      {cleanup_list,[<0.19392.2>]}

      ]
      trap_exit: true
      status: running
      heap_size: 4181
      stack_size: 24
      reductions: 4550
      neighbours:

      [ns_server:info,2012-09-08T1:18:01.835,ns_1@10.3.121.14:<0.19487.2>:ns_replicas_builder:build_replicas_main:94]Got exit not from child ebucketmigrator. Assuming it's our parent:

      {'EXIT', <0.19393.2>, shutdown}

      [ns_server:info,2012-09-08T1:18:01.880,ns_1@10.3.121.14:ns_config_rep:ns_config_rep:do_pull:341]Pulling config from: 'ns_1@10.3.121.13'

      [ns_server:info,2012-09-08T1:18:01.885,ns_1@10.3.121.14:<0.19328.2>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.22': [<<"replication_building_704_'ns_1@10.3.121.26'">>,
      <<"replication_building_704_'ns_1@10.3.121.24'">>,
      <<"replication_building_704_'ns_1@10.3.121.23'">>]
      [error_logger:error,2012-09-08T1:18:01.895,ns_1@10.3.121.14:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.19276.2>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,

      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}}
      in function ns_single_vbucket_mover:mover_inner_old_style/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.16591.2>,<0.22331.1>]
      messages: [{'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}

      ]}},
      {gen_server,call,
      ['tap_replication_manager-default',

      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}]
      links: [<0.16591.2>]
      dictionary: [{cleanup_list,[<0.19328.2>]}]
      trap_exit: true
      status: running
      heap_size: 4181
      stack_size: 24
      reductions: 4434
      neighbours:

      [ns_server:info,2012-09-08T1:18:01.930,ns_1@10.3.121.14:<0.19487.2>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.16': [<<"replication_building_399_'ns_1@10.3.121.26'">>,
      <<"replication_building_399_'ns_1@10.3.121.24'">>,
      <<"replication_building_399_'ns_1@10.3.121.14'">>]
      [error_logger:error,2012-09-08T1:18:01.937,ns_1@10.3.121.14:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.19393.2>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}}
      in function ns_single_vbucket_mover:mover_inner_old_style/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.16591.2>,<0.22331.1>]
      messages: [{'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'}

      ,
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}]
      links: [<0.16591.2>]
      dictionary: [

      {cleanup_list,[<0.19487.2>]}

      ]
      trap_exit: true
      status: running
      heap_size: 4181
      stack_size: 24
      reductions: 4435
      neighbours:

      [couchdb:info,2012-09-08T1:18:01.977,ns_1@10.3.121.14:<0.15832.2>:couch_log:info:39]10.3.121.22 - - POST /_view_merge/?stale=ok&limit=10 200
      [ns_server:error,2012-09-08T1:18:02.072,ns_1@10.3.121.14:<0.5850.0>:ns_memcached:verify_report_long_call:274]call topkeys took too long: 836560 us
      [rebalance:debug,2012-09-08T1:18:02.075,ns_1@10.3.121.14:<0.19493.2>:ns_single_vbucket_mover:mover_inner_old_style:195]child replicas builder for vbucket 138 is <0.19520.2>
      [ns_server:info,2012-09-08T1:18:02.077,ns_1@10.3.121.14:<0.19493.2>:ns_single_vbucket_mover:mover_inner_old_style:199]Got exit message (parent is <0.16591.2>). Exiting...
      {'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,

      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}
      [ns_server:debug,2012-09-08T1:18:02.115,ns_1@10.3.121.14:<0.19520.2>:ns_replicas_builder_utils:spawn_replica_builder:86]Replica building ebucketmigrator for vbucket 138 into 'ns_1@10.3.121.26' is <20326.5386.1>
      [ns_server:info,2012-09-08T1:18:02.125,ns_1@10.3.121.14:ns_port_memcached:ns_port_server:log:169]memcached<0.2005.0>: Sat Sep 8 08:18:01.920865 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.16 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0

      [ns_server:debug,2012-09-08T1:18:02.142,ns_1@10.3.121.14:<0.3277.0>:mc_connection:do_delete_vbucket:118]Notifying mc_couch_events of vbucket deletion: default/137
      [views:info,2012-09-08T1:18:02.146,ns_1@10.3.121.14:'capi_set_view_manager-default':capi_set_view_manager:apply_index_states:459]
      Calling couch_set_view:set_partition_states([<<"default">>,<<"_design/d1">>,
      [103,104,105,106,107,108,109,110,
      111,112,113,114,115,116,117,118,
      119,120,121,122,123,124,125,126,
      127,128,129,130,131,132,138,139,
      140,141,142,143,144,145,146,147,
      148,149,150,151,152,153,154,155,
      156,157,158,159,160,161,162,163,
      164,165,166,167,168,169,170,171,
      172,173,174,175,176,177,178,179,
      180,181,182,183,184,185,186,187,
      188,189,190,191,192,193,194,195,
      196,197,198,199,200,201,202,203,
      204,205],
      [],
      [0,1,2,3,4,5,6,7,8,9,10,11,12,13,
      14,15,16,17,18,19,20,21,22,23,
      24,25,26,27,28,29,30,31,32,33,
      34,35,36,37,38,39,40,41,42,43,
      44,45,46,47,48,49,50,51,52,53,
      54,55,56,57,58,59,60,61,62,63,
      64,65,66,67,68,69,70,71,72,73,
      74,75,76,77,78,79,80,81,82,83,
      84,85,86,87,88,89,90,91,92,93,
      94,95,96,97,98,99,100,101,102,
      133,134,135,136,137,206,207,208,
      209,210,211,212,213,214,215,216,
      217,218,219,220,221,222,223,224,
      225,226,227,228,229,230,231,232,
      233,234,235,236,237,238,239,240,
      241,242,243,244,245,246,247,248,
      249,250,251,252,253,254,255,256,
      257,258,259,260,261,262,263,264,
      265,266,267,268,269,270,271,272,
      273,274,275,276,277,278,279,280,
      281,282,283,284,285,286,287,288,
      289,290,291,292,293,294,295,296,
      297,298,299,300,301,302,303,304,
      305,306,307,308,309,310,311,312,
      313,314,315,316,317,318,319,320,
      321,322,323,324,325,326,327,328,
      329,330,331,332,333,334,335,336,
      337,338,339,340,341,342,343,344,
      345,346,347,348,349,350,351,352,
      353,354,355,356,357,358,359,360,
      361,362,363,364,365,366,367,368,
      369,370,371,372,373,374,375,376,
      377,378,379,380,381,382,383,384,
      385,386,387,388,389,390,391,392,
      393,394,395,396,397,398,399,400,
      401,402,403,404,405,406,407,408,
      409,410,411,412,413,414,415,416,
      417,418,419,420,421,422,423,424,
      425,426,427,428,429,430,431,432,
      433,434,435,436,437,438,439,440,
      441,442,443,444,445,446,447,448,
      449,450,451,452,453,454,455,456,
      457,458,459,460,461,462,463,464,
      465,466,467,468,469,470,471,472,
      473,474,475,476,477,478,479,480,
      481,482,483,484,485,486,487,488,
      489,490,491,492,493,494,495,496,
      497,498,499,500,501,502,503,504,
      505,506,507,508,509,510,511,512,
      513,514,515,516,517,518,519,520,
      521,522,523,524,525,526,527,528,
      529,530,531,532,533,534,535,536,
      537,538,539,540,541,542,543,544,
      545,546,547,548,549,550,551,552,
      553,554,555,556,557,558,559,560,
      561,562,563,564,565,566,567,568,
      569,570,571,572,573,574,575,576,
      577,578,579,580,581,582,583,584,
      585,586,587,588,589,590,591,592,
      593,594,595,596,597,598,599,600,
      601,602,603,604,605,606,607,608,
      609,610,611,612,613,614,615,616,
      617,618,619,620,621,622,623,624,
      625,626,627,628,629,630,631,632,
      633,634,635,636,637,638,639,640,
      641,642,643,644,645,646,647,648,
      649,650,651,652,653,654,655,656,
      657,658,659,660,661,662,663,664,
      665,666,667,668,669,670,671,672,
      673,674,675,676,677,678,679,680,
      681,682,683,684,685,686,687,688,
      689,690,691,692,693,694,695,696,
      697,698,699,700,701,702,703,704,
      705,706,707,708,709,710,711,712,
      713,714,715,716,717,718,719,720,
      721,722,723,724,725,726,727,728,
      729,730,731,732,733,734,735,736,
      737,738,739,740,741,742,743,744,
      745,746,747,748,749,750,751,752,
      753,754,755,756,757,758,759,760,
      761,762,763,764,765,766,767,768,
      769,770,771,772,773,774,775,776,
      777,778,779,780,781,782,783,784,
      785,786,787,788,789,790,791,792,
      793,794,795,796,797,798,799,800,
      801,802,803,804,805,806,807,808,
      809,810,811,812,813,814,815,816,
      817,818,819,820,821,822,823,824,
      825,826,827,828,829,830,831,832,
      833,834,835,836,837,838,839,840,
      841,842,843,844,845,846,847,848,
      849,850,851,852,853,854,855,856,
      857,858,859,860,861,862,863,864,
      865,866,867,868,869,870,871,872,
      873,874,875,876,877,878,879,880,
      881,882,883,884,885,886,887,888,
      889,890,891,892,893,894,895,896,
      897,898,899,900,901,902,903,904,
      905,906,907,908,909,910,911,912,
      913,914,915,916,917,918,919,920,
      921,922,923,924,925,926,927,928,
      929,930,931,932,933,934,935,936,
      937,938,939,940,941,942,943,944,
      945,946,947,948,949,950,951,952,
      953,954,955,956,957,958,959,960,
      961,962,963,964,965,966,967,968,
      969,970,971,972,973,974,975,976,
      977,978,979,980,981,982,983,984,
      985,986,987,988,989,990,991,992,
      993,994,995,996,997,998,999,
      1000,1001,1002,1003,1004,1005,
      1006,1007,1008,1009,1010,1011,
      1012,1013,1014,1015,1016,1017,
      1018,1019,1020,1021,1022,1023]])
      [ns_server:debug,2012-09-08T1:18:02.161,ns_1@10.3.121.14:<0.19520.2>:ns_replicas_builder_utils:spawn_replica_builder:86]Replica building ebucketmigrator for vbucket 138 into 'ns_1@10.3.121.16' is <18036.10781.2>
      [couchdb:info,2012-09-08T1:18:02.162,ns_1@10.3.121.14:<0.18109.0>:couch_log:info:39]Stopping updater for set view `default`, main group `_design/d1`
      [ns_server:debug,2012-09-08T1:18:02.176,ns_1@10.3.121.14:<0.19520.2>:ns_replicas_builder_utils:spawn_replica_builder:86]Replica building ebucketmigrator for vbucket 138 into 'ns_1@10.3.121.24' is <18041.13682.2>
      [couchdb:info,2012-09-08T1:18:02.179,ns_1@10.3.121.14:<0.18109.0>:couch_log:info:39]Updater, set view `default`, main group `_design/d1`, stopped with reason: {updater_error, shutdown}
      [couchdb:info,2012-09-08T1:18:02.234,ns_1@10.3.121.14:<0.18109.0>:couch_log:info:39]Set view `default`, main group `_design/d1`, partition states updated
      active partitions before: [103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199]
      active partitions after: [103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199]
      passive partitions before: []
      passive partitions after: []
      cleanup partitions before: [133,200,201,202,203,204,205]
      cleanup partitions after: [133,137,200,201,202,203,204,205]
      unindexable partitions: []
      replica partitions before: [0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,58,59,60,61,62,63,80,81,82,83,84,85,92,93,94,95,96,97,98,99,100,101,102,206,207,208,209,210,211,218,219,220,221,222,223,224,225,226,227,228,229,281,282,283,284,285,286,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,356,357,358,359,360,361,374,375,376,377,378,395,396,397,398,399,423,424,425,426,427,428,429,430,431,432,433,514,515,516,517,518,519,525,526,527,528,529,530,531,532,533,534,535,627,628,629,630,631,632,633,634,635,636,637,729,730,731,732,733,734,735,736,737,738,739,832,833,834,835,836,837,838,839,840,841,842,854,855,856,857,858,933,934,935,936,937,938,939,940,941,942,943,944,1000,1001,1002,1003,1004,1005,1012,1013,1014,1015,1016,1017]
      replica partitions after: [0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,58,59,60,61,62,63,80,81,82,83,84,85,92,93,94,95,96,97,98,99,100,101,102,206,207,208,209,210,211,218,219,220,221,222,223,224,225,226,227,228,229,281,282,283,284,285,286,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,356,357,358,359,360,361,374,375,376,377,378,395,396,397,398,399,423,424,425,426,427,428,429,430,431,432,433,514,515,516,517,518,519,525,526,527,528,529,530,531,532,533,534,535,627,628,629,630,631,632,633,634,635,636,637,729,730,731,732,733,734,735,736,737,738,739,832,833,834,835,836,837,838,839,840,841,842,854,855,856,857,858,933,934,935,936,937,938,939,940,941,942,943,944,1000,1001,1002,1003,1004,1005,1012,1013,1014,1015,1016,1017]
      replicas on transfer before: []
      replicas on transfer after: []
      pending transition before:
      active: [200,201,202,203,204,205]
      passive: []
      pending transition after:
      active: [200,201,202,203,204,205]
      passive: []

      [ns_server:info,2012-09-08T1:18:02.238,ns_1@10.3.121.14:<0.19520.2>:ns_replicas_builder:build_replicas_main:94]Got exit not from child ebucketmigrator. Assuming it's our parent: {'EXIT', <0.19493.2>, shutdown}
      [couchdb:info,2012-09-08T1:18:02.241,ns_1@10.3.121.14:<0.18109.0>:couch_log:info:39]Starting updater for set view `default`, main group `_design/d1`
      [couchdb:info,2012-09-08T1:18:02.242,ns_1@10.3.121.14:<0.19533.2>:couch_log:info:39]Updater for set view `default`, main group `_design/d1` started
      Active partitions: [103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199]
      Passive partitions: []
      Cleanup partitions: [133,137,200,201,202,203,204,205]
      Replicas to transfer: []
      Pending transition:
      active: [200,201,202,203,204,205]
      passive: []
      Initial build: false

      [views:info,2012-09-08T1:18:02.242,ns_1@10.3.121.14:'capi_set_view_manager-default':capi_set_view_manager:apply_index_states:460]
      couch_set_view:set_partition_states([<<"default">>,<<"_design/d1">>,
      [103,104,105,106,107,108,109,110,111,112,
      113,114,115,116,117,118,119,120,121,122,
      123,124,125,126,127,128,129,130,131,132,
      138,139,140,141,142,143,144,145,146,147,
      148,149,150,151,152,153,154,155,156,157,
      158,159,160,161,162,163,164,165,166,167,
      168,169,170,171,172,173,174,175,176,177,
      178,179,180,181,182,183,184,185,186,187,
      188,189,190,191,192,193,194,195,196,197,
      198,199,200,201,202,203,204,205],
      [],
      [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
      16,17,18,19,20,21,22,23,24,25,26,27,28,
      29,30,31,32,33,34,35,36,37,38,39,40,41,
      42,43,44,45,46,47,48,49,50,51,52,53,54,
      55,56,57,58,59,60,61,62,63,64,65,66,67,
      68,69,70,71,72,73,74,75,76,77,78,79,80,
      81,82,83,84,85,86,87,88,89,90,91,92,93,
      94,95,96,97,98,99,100,101,102,133,134,
      135,136,137,206,207,208,209,210,211,212,
      213,214,215,216,217,218,219,220,221,222,
      223,224,225,226,227,228,229,230,231,232,
      233,234,235,236,237,238,239,240,241,242,
      243,244,245,246,247,248,249,250,251,252,
      253,254,255,256,257,258,259,260,261,262,
      263,264,265,266,267,268,269,270,271,272,
      273,274,275,276,277,278,279,280,281,282,
      283,284,285,286,287,288,289,290,291,292,
      293,294,295,296,297,298,299,300,301,302,
      303,304,305,306,307,308,309,310,311,312,
      313,314,315,316,317,318,319,320,321,322,
      323,324,325,326,327,328,329,330,331,332,
      333,334,335,336,337,338,339,340,341,342,
      343,344,345,346,347,348,349,350,351,352,
      353,354,355,356,357,358,359,360,361,362,
      363,364,365,366,367,368,369,370,371,372,
      373,374,375,376,377,378,379,380,381,382,
      383,384,385,386,387,388,389,390,391,392,
      393,394,395,396,397,398,399,400,401,402,
      403,404,405,406,407,408,409,410,411,412,
      413,414,415,416,417,418,419,420,421,422,
      423,424,425,426,427,428,429,430,431,432,
      433,434,435,436,437,438,439,440,441,442,
      443,444,445,446,447,448,449,450,451,452,
      453,454,455,456,457,458,459,460,461,462,
      463,464,465,466,467,468,469,470,471,472,
      473,474,475,476,477,478,479,480,481,482,
      483,484,485,486,487,488,489,490,491,492,
      493,494,495,496,497,498,499,500,501,502,
      503,504,505,506,507,508,509,510,511,512,
      513,514,515,516,517,518,519,520,521,522,
      523,524,525,526,527,528,529,530,531,532,
      533,534,535,536,537,538,539,540,541,542,
      543,544,545,546,547,548,549,550,551,552,
      553,554,555,556,557,558,559,560,561,562,
      563,564,565,566,567,568,569,570,571,572,
      573,574,575,576,577,578,579,580,581,582,
      583,584,585,586,587,588,589,590,591,592,
      593,594,595,596,597,598,599,600,601,602,
      603,604,605,606,607,608,609,610,611,612,
      613,614,615,616,617,618,619,620,621,622,
      623,624,625,626,627,628,629,630,631,632,
      633,634,635,636,637,638,639,640,641,642,
      643,644,645,646,647,648,649,650,651,652,
      653,654,655,656,657,658,659,660,661,662,
      663,664,665,666,667,668,669,670,671,672,
      673,674,675,676,677,678,679,680,681,682,
      683,684,685,686,687,688,689,690,691,692,
      693,694,695,696,697,698,699,700,701,702,
      703,704,705,706,707,708,709,710,711,712,
      713,714,715,716,717,718,719,720,721,722,
      723,724,725,726,727,728,729,730,731,732,
      733,734,735,736,737,738,739,740,741,742,
      743,744,745,746,747,748,749,750,751,752,
      753,754,755,756,757,758,759,760,761,762,
      763,764,765,766,767,768,769,770,771,772,
      773,774,775,776,777,778,779,780,781,782,
      783,784,785,786,787,788,789,790,791,792,
      793,794,795,796,797,798,799,800,801,802,
      803,804,805,806,807,808,809,810,811,812,
      813,814,815,816,817,818,819,820,821,822,
      823,824,825,826,827,828,829,830,831,832,
      833,834,835,836,837,838,839,840,841,842,
      843,844,845,846,847,848,849,850,851,852,
      853,854,855,856,857,858,859,860,861,862,
      863,864,865,866,867,868,869,870,871,872,
      873,874,875,876,877,878,879,880,881,882,
      883,884,885,886,887,888,889,890,891,892,
      893,894,895,896,897,898,899,900,901,902,
      903,904,905,906,907,908,909,910,911,912,
      913,914,915,916,917,918,919,920,921,922,
      923,924,925,926,927,928,929,930,931,932,
      933,934,935,936,937,938,939,940,941,942,
      943,944,945,946,947,948,949,950,951,952,
      953,954,955,956,957,958,959,960,961,962,
      963,964,965,966,967,968,969,970,971,972,
      973,974,975,976,977,978,979,980,981,982,
      983,984,985,986,987,988,989,990,991,992,
      993,994,995,996,997,998,999,1000,1001,
      1002,1003,1004,1005,1006,1007,1008,1009,
      1010,1011,1012,1013,1014,1015,1016,1017,
      1018,1019,1020,1021,1022,1023]]) returned ok in 80ms
      [couchdb:info,2012-09-08T1:18:02.271,ns_1@10.3.121.14:<0.19539.2>:couch_log:info:39]Updater reading changes from active partitions to update main set view group `_design/d1` from set `default`
      [views:info,2012-09-08T1:18:02.282,ns_1@10.3.121.14:'capi_set_view_manager-default':capi_set_view_manager:apply_index_states:464]
      Calling couch_set_view:add_replica_partitions([<<"default">>,<<"_design/d1">>,
      [0,1,2,3,4,5,6,7,8,9,10,11,24,
      25,26,27,28,29,36,37,38,39,40,
      41,42,43,44,45,46,47,48,49,50,
      51,58,59,60,61,62,63,80,81,82,
      83,84,85,92,93,94,95,96,97,98,
      99,100,101,102,206,207,208,
      209,210,211,218,219,220,221,
      222,223,224,225,226,227,228,
      229,281,282,283,284,285,286,
      321,322,323,324,325,326,327,
      328,329,330,331,332,333,334,
      335,336,337,338,339,340,341,
      342,343,344,356,357,358,359,
      360,361,374,375,376,377,378,
      395,396,397,398,399,423,424,
      425,426,427,428,429,430,431,
      432,433,514,515,516,517,518,
      519,525,526,527,528,529,530,
      531,532,533,534,535,627,628,
      629,630,631,632,633,634,635,
      636,637,729,730,731,732,733,
      734,735,736,737,738,739,832,
      833,834,835,836,837,838,839,
      840,841,842,854,855,856,857,
      858,933,934,935,936,937,938,
      939,940,941,942,943,944,1000,
      1001,1002,1003,1004,1005,1012,
      1013,1014,1015,1016,1017]])
      [couchdb:info,2012-09-08T1:18:02.285,ns_1@10.3.121.14:<0.15832.2>:couch_log:info:39]10.3.121.22 - - POST /_view_merge/?stale=ok&limit=10 200
      [couchdb:info,2012-09-08T1:18:02.296,ns_1@10.3.121.14:<0.18109.0>:couch_log:info:39]Set view `default`, main group `_design/d1`, defined new replica partitions: [0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,58,59,60,61,62,63,80,81,82,83,84,85,92,93,94,95,96,97,98,99,100,101,102,206,207,208,209,210,211,218,219,220,221,222,223,224,225,226,227,228,229,281,282,283,284,285,286,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,356,357,358,359,360,361,374,375,376,377,378,395,396,397,398,399,423,424,425,426,427,428,429,430,431,432,433,514,515,516,517,518,519,525,526,527,528,529,530,531,532,533,534,535,627,628,629,630,631,632,633,634,635,636,637,729,730,731,732,733,734,735,736,737,738,739,832,833,834,835,836,837,838,839,840,841,842,854,855,856,857,858,933,934,935,936,937,938,939,940,941,942,943,944,1000,1001,1002,1003,1004,1005,1012,1013,1014,1015,1016,1017]
      New full set of replica partitions is: [0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,58,59,60,61,62,63,80,81,82,83,84,85,92,93,94,95,96,97,98,99,100,101,102,206,207,208,209,210,211,218,219,220,221,222,223,224,225,226,227,228,229,281,282,283,284,285,286,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,356,357,358,359,360,361,374,375,376,377,378,395,396,397,398,399,423,424,425,426,427,428,429,430,431,432,433,514,515,516,517,518,519,525,526,527,528,529,530,531,532,533,534,535,627,628,629,630,631,632,633,634,635,636,637,729,730,731,732,733,734,735,736,737,738,739,832,833,834,835,836,837,838,839,840,841,842,854,855,856,857,858,933,934,935,936,937,938,939,940,941,942,943,944,1000,1001,1002,1003,1004,1005,1012,1013,1014,1015,1016,1017]

      [ns_server:info,2012-09-08T1:18:02.319,ns_1@10.3.121.14:<0.19520.2>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.14': [<<"replication_building_138_'ns_1@10.3.121.26'">>,
      <<"replication_building_138_'ns_1@10.3.121.16'">>,
      <<"replication_building_138_'ns_1@10.3.121.24'">>]
      [ns_server:info,2012-09-08T1:18:02.321,ns_1@10.3.121.14:'janitor_agent-default':janitor_agent:handle_info:646]Undoing temporary vbucket states caused by rebalance
      [user:info,2012-09-08T1:18:02.322,ns_1@10.3.121.14:<0.20487.1>:ns_orchestrator:handle_info:295]Rebalance exited with reason {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval, #Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726, undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default', 'ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,
      undefined,undefined}},
      infinity]}}}}]}

      [ns_server:debug,2012-09-08T1:18:02.322,ns_1@10.3.121.14:<0.16604.2>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_node_disco_events,<0.16591.2>} exited with reason {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,
      call,
      [ns_config,
      {eval, #Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,
      call,
      ['tap_replication_manager-default',
      {change_vbucket_replication, 726, undefined},
      infinity]}},
      {gen_server,
      call,
      [{'janitor_agent-default', 'ns_1@10.3.121.13'},
      {if_rebalance,
      <0.16591.2>,
      {update_vbucket_state,
      726,
      replica,
      undefined,
      undefined}},
      infinity]}}}}]}
      [ns_server:debug,2012-09-08T1:18:02.341,ns_1@10.3.121.14:<0.16604.2>:ns_pubsub:do_subscribe_link:149]Deleting {ns_node_disco_events,<0.16591.2>} event handler: ok
      [ns_server:debug,2012-09-08T1:18:02.345,ns_1@10.3.121.14:'capi_set_view_manager-saslbucket':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
      [ns_server:info,2012-09-08T1:18:02.354,ns_1@10.3.121.14:<0.19587.2>:diag_handler:log_all_tap_and_checkpoint_stats:126]logging tap & checkpoint stats
      [error_logger:error,2012-09-08T1:18:02.343,ns_1@10.3.121.14:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.19493.2>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}

      ]}},
      {gen_server,call,
      ['tap_replication_manager-default',

      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}}
      in function ns_single_vbucket_mover:mover_inner_old_style/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.16591.2>,<0.22331.1>]
      messages: [{'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.3.121.13'}

      ,
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}]
      links: [<0.19498.2>,<0.16591.2>]
      dictionary: [

      {cleanup_list,[<0.19520.2>]}

      ]
      trap_exit: true
      status: running
      heap_size: 2584
      stack_size: 24
      reductions: 4491
      neighbours:

      Link to diags of all nodes

      https://s3.amazonaws.com/packages.couchbase/diag-logs/orange/201209/11nodes-1697-rebalance-failed-bulk_set_vbucket_state_failed-20120908.tgz

      1. erl_healthy-node24-crash.dump.gz
        3.15 MB
        Thuan Nguyen
      2. erl-over-1gb-node14_crash.dump.gz
        5.57 MB
        Thuan Nguyen
      3. ns-diag-20121031094231.txt.xz
        831 kB
        Aleksey Kondratenko
      4. report_atop_10.6.2.37_default_simple-view_test (6).pdf
        297 kB
        Ketaki Gangal

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          thuan Thuan Nguyen created issue -
          karan Karan Kumar (Inactive) made changes -
          Field Original Value New Value
          Summary [longevity] rebalance failed due to tap_replication_manager-default crashed [longevity] rebalance failed due to timeout in tap_replication_manager for default bucket
          alkondratenko Aleksey Kondratenko (Inactive) made changes -
          Assignee Aleksey Kondratenko [ alkondratenko ] Karan Kumar [ karan ]
          karan Karan Kumar (Inactive) made changes -
          Assignee Karan Kumar [ karan ] Aleksey Kondratenko [ alkondratenko ]
          thuan Thuan Nguyen made changes -
          Attachment erl_healthy-node24-crash.dump.gz [ 14966 ]
          Attachment erl-over-1gb-node14_crash.dump.gz [ 14967 ]
          farshid Farshid Ghods (Inactive) made changes -
          Labels sblocker system-test
          farshid Farshid Ghods (Inactive) made changes -
          Sprint Status Current Sprint
          farshid Farshid Ghods (Inactive) made changes -
          Summary [longevity] rebalance failed due to timeout in tap_replication_manager for default bucket [longevity] rebalance fails with error "bulk_set_vbucket_state_failed" due to timeout in tap_replication_manager for default bucket
          Labels sblocker system-test 2.0-beta-release-notes sblocker system-test
          farshid Farshid Ghods (Inactive) made changes -
          Fix Version/s 2.0 [ 10114 ]
          Fix Version/s 2.0-beta [ 10113 ]
          farshid Farshid Ghods (Inactive) made changes -
          Priority Major [ 3 ] Blocker [ 1 ]
          alkondratenko Aleksey Kondratenko (Inactive) made changes -
          Assignee Aleksey Kondratenko [ alkondratenko ] Peter Wansch [ peter ]
          peter peter made changes -
          Assignee Peter Wansch [ peter ] Farshid Ghods [ farshid ]
          farshid Farshid Ghods (Inactive) made changes -
          Assignee Farshid Ghods [ farshid ] Mike Wiederhold [ mikew ]
          mikew Mike Wiederhold made changes -
          Assignee Mike Wiederhold [ mikew ] Karan Kumar [ karan ]
          farshid Farshid Ghods (Inactive) made changes -
          Summary [longevity] rebalance fails with error "bulk_set_vbucket_state_failed" due to timeout in tap_replication_manager for default bucket [longevity] erlang garbage collection causes huge time outs in erlang vm and causes rebalance failures , ns_timeouts (rebalance fails with error "bulk_set_vbucket_state_failed" due to timeout in tap_replication_manager)
          steve Steve Yen made changes -
          Assignee Karan Kumar [ karan ] Aleksey Kondratenko [ alkondratenko ]
          alkondratenko Aleksey Kondratenko (Inactive) made changes -
          Summary [longevity] erlang garbage collection causes huge time outs in erlang vm and causes rebalance failures , ns_timeouts (rebalance fails with error "bulk_set_vbucket_state_failed" due to timeout in tap_replication_manager) [longevity] something unknown is causing severe timeouts in ns_server. Particularly under views building and/or compaction. Which causes rebalance to fail and other types of badness.
          farshid Farshid Ghods (Inactive) made changes -
          Labels 2.0-beta-release-notes sblocker system-test 2.0-beta-release-notes system-test
          Priority Blocker [ 1 ] Critical [ 2 ]
          farshid Farshid Ghods (Inactive) made changes -
          Assignee Aleksey Kondratenko [ alkondratenko ] Thuan Nguyen [ thuan ]
          alkondratenko Aleksey Kondratenko (Inactive) made changes -
          Attachment ns-diag-20121031094231.txt.xz [ 15646 ]
          steve Steve Yen made changes -
          Assignee Thuan Nguyen [ thuan ] Aleksey Kondratenko [ alkondratenko ]
          steve Steve Yen made changes -
          Fix Version/s 2.0.1 [ 10399 ]
          Fix Version/s 2.0 [ 10114 ]
          farshid Farshid Ghods (Inactive) made changes -
          Link This issue blocks MB-7234 [ MB-7234 ]
          farshid Farshid Ghods (Inactive) made changes -
          Link This issue blocks MB-7261 [ MB-7261 ]
          farshid Farshid Ghods (Inactive) made changes -
          Priority Critical [ 2 ] Blocker [ 1 ]
          alkondratenko Aleksey Kondratenko (Inactive) made changes -
          Assignee Aleksey Kondratenko [ alkondratenko ] Aliaksey Artamonau [ aliaksey artamonau ]
          alkondratenko Aleksey Kondratenko (Inactive) made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          farshid Farshid Ghods (Inactive) made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          ketaki Ketaki Gangal made changes -
          Resolution Fixed [ 1 ]
          Status Closed [ 6 ] Reopened [ 4 ]
          ketaki Ketaki Gangal made changes -
          ketaki Ketaki Gangal made changes -
          Status Reopened [ 4 ] Closed [ 6 ]
          Resolution Fixed [ 1 ]
          jin Jin Lim made changes -
          Resolution Fixed [ 1 ]
          Status Closed [ 6 ] Reopened [ 4 ]
          Assignee Aliaksey Artamonau [ aliaksey artamonau ] Karen Zeller [ kzeller ]
          kzeller kzeller made changes -
          Summary [longevity] something unknown is causing severe timeouts in ns_server. Particularly under views building and/or compaction. Which causes rebalance to fail and other types of badness. [RN 2.0.1]][longevity] something unknown is causing severe timeouts in ns_server. Particularly under views building and/or compaction. Which causes rebalance to fail and other types of badness.
          Labels 2.0-beta-release-notes system-test 2.0-beta-release-notes 2.0.1-release-notes system-test
          Flagged Release Note [ 10010 ]
          farshid Farshid Ghods (Inactive) made changes -
          Component/s documentation [ 10012 ]
          Component/s ns_server [ 10019 ]
          kzeller kzeller made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          kzeller kzeller made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          mikew Mike Wiederhold made changes -
          Planned End 2013-03-01 12:00 (generated: set to end of MB-7111)
          mikew Mike Wiederhold made changes -
          Planned End 2013-03-01 12:00 2013-03-04 12:00 (generated: set to end of MB-7111)
          maria Maria McDuff (Inactive) made changes -
          Planned End 2013-03-04 12:00 2013-03-05 12:00 (generated: set to end of MB-7111)

            People

            • Assignee:
              kzeller kzeller
              Reporter:
              thuan Thuan Nguyen
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes