Details
Description
Cluster information:
- 11 centos 6.2 64bit server with 4 cores CPU
- Each server has 10 GB RAM and 150 GB disk.
- 8 GB RAM for couchbase server at each node
- Each server has its own drive, no disk sharing with other server.
- Cluster has 2 buckets, default (3GB) and saslbucket (3GB)
- Each bucket has one doc and 2 views for each doc.
- Loader 10.3.2.4
- Build 2.0.0-1645 on 11 nodes
- Create cluster with 10 nodes
10.3.121.13
10.3.121.14
10.3.121.15
10.3.121.16
10.3.121.17
10.3.121.20
10.3.121.22
10.3.121.24
10.3.121.25
10.3.121.23 - Data path /data
- View path /data
- Activities testing:
- Do failover, rebalance, swap rebalance in and out.
- Latest step failed for this bug.
- Failover node 14
- Rebalance and stop rebalance. So node 14 was kicked out.
- Add node 14 back to cluster and failover node 13.
- Rebalance. Rebalance failed and during rebalance, got error point to node 16
"Got error while trying to send close confirmation:
{error,enotconn} 13:56:20 - Thu Aug 30, 2012 " on node 16.Analyze diags of node 16, see this error at time 2012-08-30T13:56:20
[ns_server:debug,2012-08-30T13:56:20.614,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:kill_tapname:932]killing tap named: replication_building_30_'ns_1@10.3.121.16'
[rebalance:info,2012-08-30T13:56:20.620,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:init:522]Starting tap stream:
[{vbuckets,[]},
{checkpoints,[]},
{name,<<"replication_building_30_'ns_1@10.3.121.16'">>},
{takeover,false}]
{{"10.3.121.25",11209},
{"10.3.121.16",11209},
[{vbuckets,[30]},
{takeover,false},
{suffix,"building_30_'ns_1@10.3.121.16'"},
{username,"saslbucket"},
{password,"password"}]}
[rebalance:debug,2012-08-30T13:56:20.623,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:init:555]upstream_sender pid: <0.13244.35>
[rebalance:info,2012-08-30T13:56:20.665,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:process_upstream:880]Initial stream for vbucket 18
[rebalance:info,2012-08-30T13:56:20.666,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:process_upstream:880]Initial stream for vbucket 24
[rebalance:info,2012-08-30T13:56:20.666,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:process_upstream:880]Initial stream for vbucket 25
[rebalance:error,2012-08-30T13:56:20.666,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:confirm_sent_messages:674]Got error while trying to send close confirmation: {error,enotconn}
[views:info,2012-08-30T13:56:20.852,ns_1@10.3.121.16:'capi_ddoc_replication_srv-saslbucket':capi_set_view_manager:apply_index_states:351]
Calling couch_set_view:set_partition_states([<<"saslbucket">>,
<<"_design/d11">>,
[12,13,14,15,16,17,19,20,21,22,
23,24,25,26,36,37,38,39,40,41,
42,69,70,71,72,73,84,85,86,87,
92,93,94,103,104,105,106,107,
108,109,110,111,112,113,114,142,
143,150,151,152,153,154,155,156,
178,179,180,181,182,196,197,218,
219,220,221,222,223,224,225,226,
227,228,229,230,231,232,233,234,
235,236,237,242,243,244,245,246,
247,248,249,250,251,252,253,265,
266,267,268,288,289,290,291,309,
310,311,312,313,314,315,321,322,
323,333,334,335,336,337,338,339,
340,341,342,343,344,347,348,349,
350,351,352,353,354,355,362,363,
364,365,366,367,379,380,381,382,
383,384,385,395,396,406,407,408,
409,410,411,446,447,448,449,450,
451,452,453,503,504,505,506,507,
508,509,510,511,517,518,519,520,
521,522,523,524,603,604,613,614,
615,632,633,634,635,636,637,695,
696,748,749,750,811,812,813,843,
844,845,846,847,848,854,876,916,
917,918,919,920,927,928,929,930,
931,932,939,940,945,946,947,948,
949,950,951,965,966,978,979,
1004,1005,1012,1013,1014,1015,
1016,1017,1018,1019,1020,1021,
1022,1023],
[],
[0,1,2,3,4,5,6,7,8,9,10,11,18,27,
28,29,30,31,32,33,34,35,43,44,
45,46,47,48,49,50,51,52,53,54,
55,56,57,58,59,60,61,62,63,64,
65,66,67,68,74,75,76,77,78,79,
80,81,82,83,88,89,90,91,95,96,
97,98,99,100,101,102,115,116,
117,118,119,120,121,122,123,124,
125,126,127,128,129,130,131,132,
133,134,135,136,137,138,139,140,
141,144,145,146,147,148,149,157,
158,159,160,161,162,163,164,165,
166,167,168,169,170,171,172,173,
174,175,176,177,183,184,185,186,
187,188,189,190,191,192,193,194,
195,198,199,200,201,202,203,204,
205,206,207,208,209,210,211,212,
213,214,215,216,217,238,239,240,
241,254,255,256,257,258,259,260,
261,262,263,264,269,270,271,272,
273,274,275,276,277,278,279,280,
281,282,283,284,285,286,287,292,
293,294,295,296,297,298,299,300,
301,302,303,304,305,306,307,308,
316,317,318,319,320,324,325,326,
327,328,329,330,331,332,345,346,
356,357,358,359,360,361,368,369,
370,371,372,373,374,375,376,377,
378,386,387,388,389,390,391,392,
393,394,397,398,399,400,401,402,
403,404,405,412,413,414,415,416,
417,418,419,420,421,422,423,424,
425,426,427,428,429,430,431,432,
433,434,435,436,437,438,439,440,
441,442,443,444,445,454,455,456,
457,458,459,460,461,462,463,464,
465,466,467,468,469,470,471,472,
473,474,475,476,477,478,479,480,
481,482,483,484,485,486,487,488,
489,490,491,492,493,494,495,496,
497,498,499,500,501,502,512,513,
514,515,516,525,526,527,528,529,
530,531,532,533,534,535,536,537,
538,539,540,541,542,543,544,545,
546,547,548,549,550,551,552,553,
554,555,556,557,558,559,560,561,
562,563,564,565,566,567,568,569,
570,571,572,573,574,575,576,577,
578,579,580,581,582,583,584,585,
586,587,588,589,590,591,592,593,
594,595,596,597,598,599,600,601,
602,605,606,607,608,609,610,611,
612,616,617,618,619,620,621,622,
623,624,625,626,627,628,629,630,
631,638,639,640,641,642,643,644,
645,646,647,648,649,650,651,652,
653,654,655,656,657,658,659,660,
661,662,663,664,665,666,667,668,
669,670,671,672,673,674,675,676,
677,678,679,680,681,682,683,684,
685,686,687,688,689,690,691,692,
693,694,697,698,699,700,701,702,
703,704,705,706,707,708,709,710,
711,712,713,714,715,716,717,718,
719,720,721,722,723,724,725,726,
727,728,729,730,731,732,733,734,
735,736,737,738,739,740,741,742,
743,744,745,746,747,751,752,753,
754,755,756,757,758,759,760,761,
762,763,764,765,766,767,768,769,
770,771,772,773,774,775,776,777,
778,779,780,781,782,783,784,785,
786,787,788,789,790,791,792,793,
794,795,796,797,798,799,800,801,
802,803,804,805,806,807,808,809,
810,814,815,816,817,818,819,820,
821,822,823,824,825,826,827,828,
829,830,831,832,833,834,835,836,
837,838,839,840,841,842,849,850,
851,852,853,855,856,857,858,859,
860,861,862,863,864,865,866,867,
868,869,870,871,872,873,874,875,
877,878,879,880,881,882,883,884,
885,886,887,888,889,890,891,892,
893,894,895,896,897,898,899,900,
901,902,903,904,905,906,907,908,
909,910,911,912,913,914,915,921,
922,923,924,925,926,933,934,935,
936,937,938,941,942,943,944,952,
953,954,955,956,957,958,959,960,
961,962,963,964,967,968,969,970,
971,972,973,974,975,976,977,980,
981,982,983,984,985,986,987,988,
989,990,991,992,993,994,995,996,
997,998,999,1000,1001,1002,1003,
1006,1007,1008,1009,1010,1011]])
[ns_server:info,2012-08-30T13:56:20.867,ns_1@10.3.121.16:ns_port_memcached:ns_port_server:log:169]memcached<0.674.0>: Thu Aug 30 20:56:20.666298 3: TAP (Consumer) eq_tapq:anon_56 - Failed to reset a vbucket 18. Force disconnect
memcached<0.674.0>: Thu Aug 30 20:56:20.666374 3: TAP (Consumer) eq_tapq:anon_56 - disconnected
Link to diags of all nodes https://s3.amazonaws.com/packages.couchbase/diag-logs/large_cluster_2_0/11ndoes-1645-reb-failed-close-confirmation-error-enotconn-20120830.tgz
Link to atop file of all nodes https://s3.amazonaws.com/packages.couchbase/atop-files/2.0.0/atop-11ndoes-1645-reb-failed-close-confirmation-error-enotconn-20120830.tgz