Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6595

[RN 2.0.1]][longevity] something unknown is causing severe timeouts in ns_server. Particularly under views building and/or compaction. Which causes rebalance to fail and other types of badness.

    Details

    • Flagged:
      Release Note

      Description

      Cluster information:

      • 11 centos 6.2 64bit server with 4 cores CPU
      • Each server has 10 GB RAM and 150 GB disk.
      • 8 GB RAM for couchbase server at each node (80% total system memmories)
      • Disk format ext3 on both data and root
      • Each server has its own drive, no disk sharing with other server.
      • Load 9 million items to both buckets
      • Initial indexing, so cpu a little heavy load
      • Cluster has 2 buckets, default (3GB) and saslbucket (3GB)
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
      • Create cluster with 10 nodes installed couchbase server 2.0.0-1697

      10.3.121.13
      10.3.121.14
      10.3.121.15
      10.3.121.16
      10.3.121.17
      10.3.121.20
      10.3.121.22
      10.3.121.24
      10.3.121.25
      10.3.121.23

      • Data path /data
      • View path /data
      • Do swap rebalance. Add node 26 and remove node 25
      • Rebalance failed as in bug MB-6573
      • Then do rebalance again. Rebalance failed again with error in log page point to node 14

      Rebalance exited with reason {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,

      {eval, #Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default', {change_vbucket_replication,726, undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default', 'ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,
      undefined,undefined}},
      infinity]}}}}]}
      ns_orchestrator002 ns_1@10.3.121.14 01:18:02 - Sat Sep 8, 2012

      <0.19004.2> exited with {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,{eval, #Fun<cluster_compat_mode.0.45438860>}

      ]}},
      {gen_server,call,
      ['tap_replication_manager-default',

      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]} ns_vbucket_mover000 ns_1@10.3.121.14 01:18:01 - Sat Sep 8, 2012
      Server error during processing: ["web request failed",


      * Go to node 14, I see many tap_replication_manager-default crash right before rebalane failed at 01:18:01 - Sat Sep 8, 2012

      [error_logger:error,2012-09-08T1:18:01.836,ns_1@10.3.121.14:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.19330.2>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}}
      in function ns_single_vbucket_mover:mover_inner_old_style/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.16591.2>,<0.22331.1>]
      messages: [{'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'}

      ,
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}]
      links: [<0.16591.2>]
      dictionary: [

      {cleanup_list,[<0.19392.2>]}

      ]
      trap_exit: true
      status: running
      heap_size: 4181
      stack_size: 24
      reductions: 4550
      neighbours:

      [ns_server:info,2012-09-08T1:18:01.835,ns_1@10.3.121.14:<0.19487.2>:ns_replicas_builder:build_replicas_main:94]Got exit not from child ebucketmigrator. Assuming it's our parent:

      {'EXIT', <0.19393.2>, shutdown}

      [ns_server:info,2012-09-08T1:18:01.880,ns_1@10.3.121.14:ns_config_rep:ns_config_rep:do_pull:341]Pulling config from: 'ns_1@10.3.121.13'

      [ns_server:info,2012-09-08T1:18:01.885,ns_1@10.3.121.14:<0.19328.2>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.22': [<<"replication_building_704_'ns_1@10.3.121.26'">>,
      <<"replication_building_704_'ns_1@10.3.121.24'">>,
      <<"replication_building_704_'ns_1@10.3.121.23'">>]
      [error_logger:error,2012-09-08T1:18:01.895,ns_1@10.3.121.14:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.19276.2>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,

      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}}
      in function ns_single_vbucket_mover:mover_inner_old_style/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.16591.2>,<0.22331.1>]
      messages: [{'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}

      ]}},
      {gen_server,call,
      ['tap_replication_manager-default',

      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}]
      links: [<0.16591.2>]
      dictionary: [{cleanup_list,[<0.19328.2>]}]
      trap_exit: true
      status: running
      heap_size: 4181
      stack_size: 24
      reductions: 4434
      neighbours:

      [ns_server:info,2012-09-08T1:18:01.930,ns_1@10.3.121.14:<0.19487.2>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.16': [<<"replication_building_399_'ns_1@10.3.121.26'">>,
      <<"replication_building_399_'ns_1@10.3.121.24'">>,
      <<"replication_building_399_'ns_1@10.3.121.14'">>]
      [error_logger:error,2012-09-08T1:18:01.937,ns_1@10.3.121.14:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.19393.2>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}}
      in function ns_single_vbucket_mover:mover_inner_old_style/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.16591.2>,<0.22331.1>]
      messages: [{'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'}

      ,
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}]
      links: [<0.16591.2>]
      dictionary: [

      {cleanup_list,[<0.19487.2>]}

      ]
      trap_exit: true
      status: running
      heap_size: 4181
      stack_size: 24
      reductions: 4435
      neighbours:

      [couchdb:info,2012-09-08T1:18:01.977,ns_1@10.3.121.14:<0.15832.2>:couch_log:info:39]10.3.121.22 - - POST /_view_merge/?stale=ok&limit=10 200
      [ns_server:error,2012-09-08T1:18:02.072,ns_1@10.3.121.14:<0.5850.0>:ns_memcached:verify_report_long_call:274]call topkeys took too long: 836560 us
      [rebalance:debug,2012-09-08T1:18:02.075,ns_1@10.3.121.14:<0.19493.2>:ns_single_vbucket_mover:mover_inner_old_style:195]child replicas builder for vbucket 138 is <0.19520.2>
      [ns_server:info,2012-09-08T1:18:02.077,ns_1@10.3.121.14:<0.19493.2>:ns_single_vbucket_mover:mover_inner_old_style:199]Got exit message (parent is <0.16591.2>). Exiting...
      {'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,

      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}
      [ns_server:debug,2012-09-08T1:18:02.115,ns_1@10.3.121.14:<0.19520.2>:ns_replicas_builder_utils:spawn_replica_builder:86]Replica building ebucketmigrator for vbucket 138 into 'ns_1@10.3.121.26' is <20326.5386.1>
      [ns_server:info,2012-09-08T1:18:02.125,ns_1@10.3.121.14:ns_port_memcached:ns_port_server:log:169]memcached<0.2005.0>: Sat Sep 8 08:18:01.920865 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.16 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0

      [ns_server:debug,2012-09-08T1:18:02.142,ns_1@10.3.121.14:<0.3277.0>:mc_connection:do_delete_vbucket:118]Notifying mc_couch_events of vbucket deletion: default/137
      [views:info,2012-09-08T1:18:02.146,ns_1@10.3.121.14:'capi_set_view_manager-default':capi_set_view_manager:apply_index_states:459]
      Calling couch_set_view:set_partition_states([<<"default">>,<<"_design/d1">>,
      [103,104,105,106,107,108,109,110,
      111,112,113,114,115,116,117,118,
      119,120,121,122,123,124,125,126,
      127,128,129,130,131,132,138,139,
      140,141,142,143,144,145,146,147,
      148,149,150,151,152,153,154,155,
      156,157,158,159,160,161,162,163,
      164,165,166,167,168,169,170,171,
      172,173,174,175,176,177,178,179,
      180,181,182,183,184,185,186,187,
      188,189,190,191,192,193,194,195,
      196,197,198,199,200,201,202,203,
      204,205],
      [],
      [0,1,2,3,4,5,6,7,8,9,10,11,12,13,
      14,15,16,17,18,19,20,21,22,23,
      24,25,26,27,28,29,30,31,32,33,
      34,35,36,37,38,39,40,41,42,43,
      44,45,46,47,48,49,50,51,52,53,
      54,55,56,57,58,59,60,61,62,63,
      64,65,66,67,68,69,70,71,72,73,
      74,75,76,77,78,79,80,81,82,83,
      84,85,86,87,88,89,90,91,92,93,
      94,95,96,97,98,99,100,101,102,
      133,134,135,136,137,206,207,208,
      209,210,211,212,213,214,215,216,
      217,218,219,220,221,222,223,224,
      225,226,227,228,229,230,231,232,
      233,234,235,236,237,238,239,240,
      241,242,243,244,245,246,247,248,
      249,250,251,252,253,254,255,256,
      257,258,259,260,261,262,263,264,
      265,266,267,268,269,270,271,272,
      273,274,275,276,277,278,279,280,
      281,282,283,284,285,286,287,288,
      289,290,291,292,293,294,295,296,
      297,298,299,300,301,302,303,304,
      305,306,307,308,309,310,311,312,
      313,314,315,316,317,318,319,320,
      321,322,323,324,325,326,327,328,
      329,330,331,332,333,334,335,336,
      337,338,339,340,341,342,343,344,
      345,346,347,348,349,350,351,352,
      353,354,355,356,357,358,359,360,
      361,362,363,364,365,366,367,368,
      369,370,371,372,373,374,375,376,
      377,378,379,380,381,382,383,384,
      385,386,387,388,389,390,391,392,
      393,394,395,396,397,398,399,400,
      401,402,403,404,405,406,407,408,
      409,410,411,412,413,414,415,416,
      417,418,419,420,421,422,423,424,
      425,426,427,428,429,430,431,432,
      433,434,435,436,437,438,439,440,
      441,442,443,444,445,446,447,448,
      449,450,451,452,453,454,455,456,
      457,458,459,460,461,462,463,464,
      465,466,467,468,469,470,471,472,
      473,474,475,476,477,478,479,480,
      481,482,483,484,485,486,487,488,
      489,490,491,492,493,494,495,496,
      497,498,499,500,501,502,503,504,
      505,506,507,508,509,510,511,512,
      513,514,515,516,517,518,519,520,
      521,522,523,524,525,526,527,528,
      529,530,531,532,533,534,535,536,
      537,538,539,540,541,542,543,544,
      545,546,547,548,549,550,551,552,
      553,554,555,556,557,558,559,560,
      561,562,563,564,565,566,567,568,
      569,570,571,572,573,574,575,576,
      577,578,579,580,581,582,583,584,
      585,586,587,588,589,590,591,592,
      593,594,595,596,597,598,599,600,
      601,602,603,604,605,606,607,608,
      609,610,611,612,613,614,615,616,
      617,618,619,620,621,622,623,624,
      625,626,627,628,629,630,631,632,
      633,634,635,636,637,638,639,640,
      641,642,643,644,645,646,647,648,
      649,650,651,652,653,654,655,656,
      657,658,659,660,661,662,663,664,
      665,666,667,668,669,670,671,672,
      673,674,675,676,677,678,679,680,
      681,682,683,684,685,686,687,688,
      689,690,691,692,693,694,695,696,
      697,698,699,700,701,702,703,704,
      705,706,707,708,709,710,711,712,
      713,714,715,716,717,718,719,720,
      721,722,723,724,725,726,727,728,
      729,730,731,732,733,734,735,736,
      737,738,739,740,741,742,743,744,
      745,746,747,748,749,750,751,752,
      753,754,755,756,757,758,759,760,
      761,762,763,764,765,766,767,768,
      769,770,771,772,773,774,775,776,
      777,778,779,780,781,782,783,784,
      785,786,787,788,789,790,791,792,
      793,794,795,796,797,798,799,800,
      801,802,803,804,805,806,807,808,
      809,810,811,812,813,814,815,816,
      817,818,819,820,821,822,823,824,
      825,826,827,828,829,830,831,832,
      833,834,835,836,837,838,839,840,
      841,842,843,844,845,846,847,848,
      849,850,851,852,853,854,855,856,
      857,858,859,860,861,862,863,864,
      865,866,867,868,869,870,871,872,
      873,874,875,876,877,878,879,880,
      881,882,883,884,885,886,887,888,
      889,890,891,892,893,894,895,896,
      897,898,899,900,901,902,903,904,
      905,906,907,908,909,910,911,912,
      913,914,915,916,917,918,919,920,
      921,922,923,924,925,926,927,928,
      929,930,931,932,933,934,935,936,
      937,938,939,940,941,942,943,944,
      945,946,947,948,949,950,951,952,
      953,954,955,956,957,958,959,960,
      961,962,963,964,965,966,967,968,
      969,970,971,972,973,974,975,976,
      977,978,979,980,981,982,983,984,
      985,986,987,988,989,990,991,992,
      993,994,995,996,997,998,999,
      1000,1001,1002,1003,1004,1005,
      1006,1007,1008,1009,1010,1011,
      1012,1013,1014,1015,1016,1017,
      1018,1019,1020,1021,1022,1023]])
      [ns_server:debug,2012-09-08T1:18:02.161,ns_1@10.3.121.14:<0.19520.2>:ns_replicas_builder_utils:spawn_replica_builder:86]Replica building ebucketmigrator for vbucket 138 into 'ns_1@10.3.121.16' is <18036.10781.2>
      [couchdb:info,2012-09-08T1:18:02.162,ns_1@10.3.121.14:<0.18109.0>:couch_log:info:39]Stopping updater for set view `default`, main group `_design/d1`
      [ns_server:debug,2012-09-08T1:18:02.176,ns_1@10.3.121.14:<0.19520.2>:ns_replicas_builder_utils:spawn_replica_builder:86]Replica building ebucketmigrator for vbucket 138 into 'ns_1@10.3.121.24' is <18041.13682.2>
      [couchdb:info,2012-09-08T1:18:02.179,ns_1@10.3.121.14:<0.18109.0>:couch_log:info:39]Updater, set view `default`, main group `_design/d1`, stopped with reason: {updater_error, shutdown}
      [couchdb:info,2012-09-08T1:18:02.234,ns_1@10.3.121.14:<0.18109.0>:couch_log:info:39]Set view `default`, main group `_design/d1`, partition states updated
      active partitions before: [103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199]
      active partitions after: [103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199]
      passive partitions before: []
      passive partitions after: []
      cleanup partitions before: [133,200,201,202,203,204,205]
      cleanup partitions after: [133,137,200,201,202,203,204,205]
      unindexable partitions: []
      replica partitions before: [0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,58,59,60,61,62,63,80,81,82,83,84,85,92,93,94,95,96,97,98,99,100,101,102,206,207,208,209,210,211,218,219,220,221,222,223,224,225,226,227,228,229,281,282,283,284,285,286,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,356,357,358,359,360,361,374,375,376,377,378,395,396,397,398,399,423,424,425,426,427,428,429,430,431,432,433,514,515,516,517,518,519,525,526,527,528,529,530,531,532,533,534,535,627,628,629,630,631,632,633,634,635,636,637,729,730,731,732,733,734,735,736,737,738,739,832,833,834,835,836,837,838,839,840,841,842,854,855,856,857,858,933,934,935,936,937,938,939,940,941,942,943,944,1000,1001,1002,1003,1004,1005,1012,1013,1014,1015,1016,1017]
      replica partitions after: [0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,58,59,60,61,62,63,80,81,82,83,84,85,92,93,94,95,96,97,98,99,100,101,102,206,207,208,209,210,211,218,219,220,221,222,223,224,225,226,227,228,229,281,282,283,284,285,286,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,356,357,358,359,360,361,374,375,376,377,378,395,396,397,398,399,423,424,425,426,427,428,429,430,431,432,433,514,515,516,517,518,519,525,526,527,528,529,530,531,532,533,534,535,627,628,629,630,631,632,633,634,635,636,637,729,730,731,732,733,734,735,736,737,738,739,832,833,834,835,836,837,838,839,840,841,842,854,855,856,857,858,933,934,935,936,937,938,939,940,941,942,943,944,1000,1001,1002,1003,1004,1005,1012,1013,1014,1015,1016,1017]
      replicas on transfer before: []
      replicas on transfer after: []
      pending transition before:
      active: [200,201,202,203,204,205]
      passive: []
      pending transition after:
      active: [200,201,202,203,204,205]
      passive: []

      [ns_server:info,2012-09-08T1:18:02.238,ns_1@10.3.121.14:<0.19520.2>:ns_replicas_builder:build_replicas_main:94]Got exit not from child ebucketmigrator. Assuming it's our parent: {'EXIT', <0.19493.2>, shutdown}
      [couchdb:info,2012-09-08T1:18:02.241,ns_1@10.3.121.14:<0.18109.0>:couch_log:info:39]Starting updater for set view `default`, main group `_design/d1`
      [couchdb:info,2012-09-08T1:18:02.242,ns_1@10.3.121.14:<0.19533.2>:couch_log:info:39]Updater for set view `default`, main group `_design/d1` started
      Active partitions: [103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199]
      Passive partitions: []
      Cleanup partitions: [133,137,200,201,202,203,204,205]
      Replicas to transfer: []
      Pending transition:
      active: [200,201,202,203,204,205]
      passive: []
      Initial build: false

      [views:info,2012-09-08T1:18:02.242,ns_1@10.3.121.14:'capi_set_view_manager-default':capi_set_view_manager:apply_index_states:460]
      couch_set_view:set_partition_states([<<"default">>,<<"_design/d1">>,
      [103,104,105,106,107,108,109,110,111,112,
      113,114,115,116,117,118,119,120,121,122,
      123,124,125,126,127,128,129,130,131,132,
      138,139,140,141,142,143,144,145,146,147,
      148,149,150,151,152,153,154,155,156,157,
      158,159,160,161,162,163,164,165,166,167,
      168,169,170,171,172,173,174,175,176,177,
      178,179,180,181,182,183,184,185,186,187,
      188,189,190,191,192,193,194,195,196,197,
      198,199,200,201,202,203,204,205],
      [],
      [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,
      16,17,18,19,20,21,22,23,24,25,26,27,28,
      29,30,31,32,33,34,35,36,37,38,39,40,41,
      42,43,44,45,46,47,48,49,50,51,52,53,54,
      55,56,57,58,59,60,61,62,63,64,65,66,67,
      68,69,70,71,72,73,74,75,76,77,78,79,80,
      81,82,83,84,85,86,87,88,89,90,91,92,93,
      94,95,96,97,98,99,100,101,102,133,134,
      135,136,137,206,207,208,209,210,211,212,
      213,214,215,216,217,218,219,220,221,222,
      223,224,225,226,227,228,229,230,231,232,
      233,234,235,236,237,238,239,240,241,242,
      243,244,245,246,247,248,249,250,251,252,
      253,254,255,256,257,258,259,260,261,262,
      263,264,265,266,267,268,269,270,271,272,
      273,274,275,276,277,278,279,280,281,282,
      283,284,285,286,287,288,289,290,291,292,
      293,294,295,296,297,298,299,300,301,302,
      303,304,305,306,307,308,309,310,311,312,
      313,314,315,316,317,318,319,320,321,322,
      323,324,325,326,327,328,329,330,331,332,
      333,334,335,336,337,338,339,340,341,342,
      343,344,345,346,347,348,349,350,351,352,
      353,354,355,356,357,358,359,360,361,362,
      363,364,365,366,367,368,369,370,371,372,
      373,374,375,376,377,378,379,380,381,382,
      383,384,385,386,387,388,389,390,391,392,
      393,394,395,396,397,398,399,400,401,402,
      403,404,405,406,407,408,409,410,411,412,
      413,414,415,416,417,418,419,420,421,422,
      423,424,425,426,427,428,429,430,431,432,
      433,434,435,436,437,438,439,440,441,442,
      443,444,445,446,447,448,449,450,451,452,
      453,454,455,456,457,458,459,460,461,462,
      463,464,465,466,467,468,469,470,471,472,
      473,474,475,476,477,478,479,480,481,482,
      483,484,485,486,487,488,489,490,491,492,
      493,494,495,496,497,498,499,500,501,502,
      503,504,505,506,507,508,509,510,511,512,
      513,514,515,516,517,518,519,520,521,522,
      523,524,525,526,527,528,529,530,531,532,
      533,534,535,536,537,538,539,540,541,542,
      543,544,545,546,547,548,549,550,551,552,
      553,554,555,556,557,558,559,560,561,562,
      563,564,565,566,567,568,569,570,571,572,
      573,574,575,576,577,578,579,580,581,582,
      583,584,585,586,587,588,589,590,591,592,
      593,594,595,596,597,598,599,600,601,602,
      603,604,605,606,607,608,609,610,611,612,
      613,614,615,616,617,618,619,620,621,622,
      623,624,625,626,627,628,629,630,631,632,
      633,634,635,636,637,638,639,640,641,642,
      643,644,645,646,647,648,649,650,651,652,
      653,654,655,656,657,658,659,660,661,662,
      663,664,665,666,667,668,669,670,671,672,
      673,674,675,676,677,678,679,680,681,682,
      683,684,685,686,687,688,689,690,691,692,
      693,694,695,696,697,698,699,700,701,702,
      703,704,705,706,707,708,709,710,711,712,
      713,714,715,716,717,718,719,720,721,722,
      723,724,725,726,727,728,729,730,731,732,
      733,734,735,736,737,738,739,740,741,742,
      743,744,745,746,747,748,749,750,751,752,
      753,754,755,756,757,758,759,760,761,762,
      763,764,765,766,767,768,769,770,771,772,
      773,774,775,776,777,778,779,780,781,782,
      783,784,785,786,787,788,789,790,791,792,
      793,794,795,796,797,798,799,800,801,802,
      803,804,805,806,807,808,809,810,811,812,
      813,814,815,816,817,818,819,820,821,822,
      823,824,825,826,827,828,829,830,831,832,
      833,834,835,836,837,838,839,840,841,842,
      843,844,845,846,847,848,849,850,851,852,
      853,854,855,856,857,858,859,860,861,862,
      863,864,865,866,867,868,869,870,871,872,
      873,874,875,876,877,878,879,880,881,882,
      883,884,885,886,887,888,889,890,891,892,
      893,894,895,896,897,898,899,900,901,902,
      903,904,905,906,907,908,909,910,911,912,
      913,914,915,916,917,918,919,920,921,922,
      923,924,925,926,927,928,929,930,931,932,
      933,934,935,936,937,938,939,940,941,942,
      943,944,945,946,947,948,949,950,951,952,
      953,954,955,956,957,958,959,960,961,962,
      963,964,965,966,967,968,969,970,971,972,
      973,974,975,976,977,978,979,980,981,982,
      983,984,985,986,987,988,989,990,991,992,
      993,994,995,996,997,998,999,1000,1001,
      1002,1003,1004,1005,1006,1007,1008,1009,
      1010,1011,1012,1013,1014,1015,1016,1017,
      1018,1019,1020,1021,1022,1023]]) returned ok in 80ms
      [couchdb:info,2012-09-08T1:18:02.271,ns_1@10.3.121.14:<0.19539.2>:couch_log:info:39]Updater reading changes from active partitions to update main set view group `_design/d1` from set `default`
      [views:info,2012-09-08T1:18:02.282,ns_1@10.3.121.14:'capi_set_view_manager-default':capi_set_view_manager:apply_index_states:464]
      Calling couch_set_view:add_replica_partitions([<<"default">>,<<"_design/d1">>,
      [0,1,2,3,4,5,6,7,8,9,10,11,24,
      25,26,27,28,29,36,37,38,39,40,
      41,42,43,44,45,46,47,48,49,50,
      51,58,59,60,61,62,63,80,81,82,
      83,84,85,92,93,94,95,96,97,98,
      99,100,101,102,206,207,208,
      209,210,211,218,219,220,221,
      222,223,224,225,226,227,228,
      229,281,282,283,284,285,286,
      321,322,323,324,325,326,327,
      328,329,330,331,332,333,334,
      335,336,337,338,339,340,341,
      342,343,344,356,357,358,359,
      360,361,374,375,376,377,378,
      395,396,397,398,399,423,424,
      425,426,427,428,429,430,431,
      432,433,514,515,516,517,518,
      519,525,526,527,528,529,530,
      531,532,533,534,535,627,628,
      629,630,631,632,633,634,635,
      636,637,729,730,731,732,733,
      734,735,736,737,738,739,832,
      833,834,835,836,837,838,839,
      840,841,842,854,855,856,857,
      858,933,934,935,936,937,938,
      939,940,941,942,943,944,1000,
      1001,1002,1003,1004,1005,1012,
      1013,1014,1015,1016,1017]])
      [couchdb:info,2012-09-08T1:18:02.285,ns_1@10.3.121.14:<0.15832.2>:couch_log:info:39]10.3.121.22 - - POST /_view_merge/?stale=ok&limit=10 200
      [couchdb:info,2012-09-08T1:18:02.296,ns_1@10.3.121.14:<0.18109.0>:couch_log:info:39]Set view `default`, main group `_design/d1`, defined new replica partitions: [0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,58,59,60,61,62,63,80,81,82,83,84,85,92,93,94,95,96,97,98,99,100,101,102,206,207,208,209,210,211,218,219,220,221,222,223,224,225,226,227,228,229,281,282,283,284,285,286,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,356,357,358,359,360,361,374,375,376,377,378,395,396,397,398,399,423,424,425,426,427,428,429,430,431,432,433,514,515,516,517,518,519,525,526,527,528,529,530,531,532,533,534,535,627,628,629,630,631,632,633,634,635,636,637,729,730,731,732,733,734,735,736,737,738,739,832,833,834,835,836,837,838,839,840,841,842,854,855,856,857,858,933,934,935,936,937,938,939,940,941,942,943,944,1000,1001,1002,1003,1004,1005,1012,1013,1014,1015,1016,1017]
      New full set of replica partitions is: [0,1,2,3,4,5,6,7,8,9,10,11,24,25,26,27,28,29,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,58,59,60,61,62,63,80,81,82,83,84,85,92,93,94,95,96,97,98,99,100,101,102,206,207,208,209,210,211,218,219,220,221,222,223,224,225,226,227,228,229,281,282,283,284,285,286,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,356,357,358,359,360,361,374,375,376,377,378,395,396,397,398,399,423,424,425,426,427,428,429,430,431,432,433,514,515,516,517,518,519,525,526,527,528,529,530,531,532,533,534,535,627,628,629,630,631,632,633,634,635,636,637,729,730,731,732,733,734,735,736,737,738,739,832,833,834,835,836,837,838,839,840,841,842,854,855,856,857,858,933,934,935,936,937,938,939,940,941,942,943,944,1000,1001,1002,1003,1004,1005,1012,1013,1014,1015,1016,1017]

      [ns_server:info,2012-09-08T1:18:02.319,ns_1@10.3.121.14:<0.19520.2>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.14': [<<"replication_building_138_'ns_1@10.3.121.26'">>,
      <<"replication_building_138_'ns_1@10.3.121.16'">>,
      <<"replication_building_138_'ns_1@10.3.121.24'">>]
      [ns_server:info,2012-09-08T1:18:02.321,ns_1@10.3.121.14:'janitor_agent-default':janitor_agent:handle_info:646]Undoing temporary vbucket states caused by rebalance
      [user:info,2012-09-08T1:18:02.322,ns_1@10.3.121.14:<0.20487.1>:ns_orchestrator:handle_info:295]Rebalance exited with reason {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval, #Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726, undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default', 'ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,
      undefined,undefined}},
      infinity]}}}}]}

      [ns_server:debug,2012-09-08T1:18:02.322,ns_1@10.3.121.14:<0.16604.2>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_node_disco_events,<0.16591.2>} exited with reason {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,
      call,
      [ns_config,
      {eval, #Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,
      call,
      ['tap_replication_manager-default',
      {change_vbucket_replication, 726, undefined},
      infinity]}},
      {gen_server,
      call,
      [{'janitor_agent-default', 'ns_1@10.3.121.13'},
      {if_rebalance,
      <0.16591.2>,
      {update_vbucket_state,
      726,
      replica,
      undefined,
      undefined}},
      infinity]}}}}]}
      [ns_server:debug,2012-09-08T1:18:02.341,ns_1@10.3.121.14:<0.16604.2>:ns_pubsub:do_subscribe_link:149]Deleting {ns_node_disco_events,<0.16591.2>} event handler: ok
      [ns_server:debug,2012-09-08T1:18:02.345,ns_1@10.3.121.14:'capi_set_view_manager-saslbucket':capi_set_view_manager:handle_info:337]doing replicate_newnodes_docs
      [ns_server:info,2012-09-08T1:18:02.354,ns_1@10.3.121.14:<0.19587.2>:diag_handler:log_all_tap_and_checkpoint_stats:126]logging tap & checkpoint stats
      [error_logger:error,2012-09-08T1:18:02.343,ns_1@10.3.121.14:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.19493.2>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}

      ]}},
      {gen_server,call,
      ['tap_replication_manager-default',

      {change_vbucket_replication,726,undefined},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-default','ns_1@10.3.121.13'},
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}}
      in function ns_single_vbucket_mover:mover_inner_old_style/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.16591.2>,<0.22331.1>]
      messages: [{'EXIT',<0.16591.2>,
      {bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.121.13',
      {'EXIT',
      {{{timeout,
      {gen_server,call,
      [ns_config,
      {eval,#Fun<cluster_compat_mode.0.45438860>}]}},
      {gen_server,call,
      ['tap_replication_manager-default',
      {change_vbucket_replication,726,undefined}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.3.121.13'}

      ,
      {if_rebalance,<0.16591.2>,
      {update_vbucket_state,726,replica,undefined,
      undefined}},
      infinity]}}}}]}}]
      links: [<0.19498.2>,<0.16591.2>]
      dictionary: [

      {cleanup_list,[<0.19520.2>]}

      ]
      trap_exit: true
      status: running
      heap_size: 2584
      stack_size: 24
      reductions: 4491
      neighbours:

      Link to diags of all nodes

      https://s3.amazonaws.com/packages.couchbase/diag-logs/orange/201209/11nodes-1697-rebalance-failed-bulk_set_vbucket_state_failed-20120908.tgz

      1. erl_healthy-node24-crash.dump.gz
        3.15 MB
        Thuan Nguyen
      2. erl-over-1gb-node14_crash.dump.gz
        5.57 MB
        Thuan Nguyen
      3. ns-diag-20121031094231.txt.xz
        831 kB
        Aleksey Kondratenko
      4. report_atop_10.6.2.37_default_simple-view_test (6).pdf
        297 kB
        Ketaki Gangal

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          alkondratenko Aleksey Kondratenko (Inactive) added a comment -

          correction, this was fixed on GNU/Linux, but not on windows.

          Show
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - correction, this was fixed on GNU/Linux, but not on windows.
          Hide
          kzeller kzeller added a comment -

          RN: The server experienced severe timeouts during
          rebalance if views were being indexed or compacted at the same time.
          This resulted in the rebalance to fail. This has been fixed.

          Note from Alk: the guidance on swapness % should not be part of note.

          Show
          kzeller kzeller added a comment - RN: The server experienced severe timeouts during rebalance if views were being indexed or compacted at the same time. This resulted in the rebalance to fail. This has been fixed. Note from Alk: the guidance on swapness % should not be part of note.
          Hide
          kzeller kzeller added a comment -

          RN: The server experienced severe timeouts during
          rebalance if views were being indexed or compacted at the same time.
          This resulted in the rebalance to fail. This has been fixed.

          Show
          kzeller kzeller added a comment - RN: The server experienced severe timeouts during rebalance if views were being indexed or compacted at the same time. This resulted in the rebalance to fail. This has been fixed.
          Hide
          jin Jin Lim (Inactive) added a comment -

          Please close this ticket once after you collect all information required for the release note. Thanks much!

          Show
          jin Jin Lim (Inactive) added a comment - Please close this ticket once after you collect all information required for the release note. Thanks much!
          Hide
          jin Jin Lim (Inactive) added a comment -

          Assign it to Karen quickly so she has time to capture what needs to go in to the release note. Karen, please review the above comments about swappiness. Thanks!

          Show
          jin Jin Lim (Inactive) added a comment - Assign it to Karen quickly so she has time to capture what needs to go in to the release note. Karen, please review the above comments about swappiness. Thanks!

            People

            • Assignee:
              kzeller kzeller
              Reporter:
              thuan Thuan Nguyen
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes