Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6494

[longevity] rebalance fails with error "Got error while trying to send close confirmation: {error,enotconn} during rebalance" when adding a failed over node back to the cluster and failing over another node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 2.0.1
    • 2.0-beta
    • ns_server
    • Security Level: Public
    • None
    • centos 6.2 64 bit

    Description

      Cluster information:

      • 11 centos 6.2 64bit server with 4 cores CPU
      • Each server has 10 GB RAM and 150 GB disk.
      • 8 GB RAM for couchbase server at each node
      • Each server has its own drive, no disk sharing with other server.
      • Cluster has 2 buckets, default (3GB) and saslbucket (3GB)
      • Each bucket has one doc and 2 views for each doc.
      • Loader 10.3.2.4
      • Build 2.0.0-1645 on 11 nodes
      • Create cluster with 10 nodes
        10.3.121.13
        10.3.121.14
        10.3.121.15
        10.3.121.16
        10.3.121.17
        10.3.121.20
        10.3.121.22
        10.3.121.24
        10.3.121.25
        10.3.121.23
      • Data path /data
      • View path /data
      • Activities testing:
      • Do failover, rebalance, swap rebalance in and out.
      • Latest step failed for this bug.
      • Failover node 14
      • Rebalance and stop rebalance. So node 14 was kicked out.
      • Add node 14 back to cluster and failover node 13.
      • Rebalance. Rebalance failed and during rebalance, got error point to node 16

      "Got error while trying to send close confirmation:

      {error,enotconn} 13:56:20 - Thu Aug 30, 2012 " on node 16.

      Analyze diags of node 16, see this error at time 2012-08-30T13:56:20

      [ns_server:debug,2012-08-30T13:56:20.614,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:kill_tapname:932]killing tap named: replication_building_30_'ns_1@10.3.121.16'
      [rebalance:info,2012-08-30T13:56:20.620,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:init:522]Starting tap stream:
      [{vbuckets,[]},
      {checkpoints,[]},
      {name,<<"replication_building_30_'ns_1@10.3.121.16'">>},
      {takeover,false}]
      {{"10.3.121.25",11209},
      {"10.3.121.16",11209},
      [{vbuckets,[30]},
      {takeover,false},
      {suffix,"building_30_'ns_1@10.3.121.16'"},
      {username,"saslbucket"},
      {password,"password"}]}
      [rebalance:debug,2012-08-30T13:56:20.623,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:init:555]upstream_sender pid: <0.13244.35>
      [rebalance:info,2012-08-30T13:56:20.665,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:process_upstream:880]Initial stream for vbucket 18
      [rebalance:info,2012-08-30T13:56:20.666,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:process_upstream:880]Initial stream for vbucket 24
      [rebalance:info,2012-08-30T13:56:20.666,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:process_upstream:880]Initial stream for vbucket 25
      [rebalance:error,2012-08-30T13:56:20.666,ns_1@10.3.121.16:<0.13236.35>:ebucketmigrator_srv:confirm_sent_messages:674]Got error while trying to send close confirmation: {error,enotconn}

      [views:info,2012-08-30T13:56:20.852,ns_1@10.3.121.16:'capi_ddoc_replication_srv-saslbucket':capi_set_view_manager:apply_index_states:351]
      Calling couch_set_view:set_partition_states([<<"saslbucket">>,
      <<"_design/d11">>,
      [12,13,14,15,16,17,19,20,21,22,
      23,24,25,26,36,37,38,39,40,41,
      42,69,70,71,72,73,84,85,86,87,
      92,93,94,103,104,105,106,107,
      108,109,110,111,112,113,114,142,
      143,150,151,152,153,154,155,156,
      178,179,180,181,182,196,197,218,
      219,220,221,222,223,224,225,226,
      227,228,229,230,231,232,233,234,
      235,236,237,242,243,244,245,246,
      247,248,249,250,251,252,253,265,
      266,267,268,288,289,290,291,309,
      310,311,312,313,314,315,321,322,
      323,333,334,335,336,337,338,339,
      340,341,342,343,344,347,348,349,
      350,351,352,353,354,355,362,363,
      364,365,366,367,379,380,381,382,
      383,384,385,395,396,406,407,408,
      409,410,411,446,447,448,449,450,
      451,452,453,503,504,505,506,507,
      508,509,510,511,517,518,519,520,
      521,522,523,524,603,604,613,614,
      615,632,633,634,635,636,637,695,
      696,748,749,750,811,812,813,843,
      844,845,846,847,848,854,876,916,
      917,918,919,920,927,928,929,930,
      931,932,939,940,945,946,947,948,
      949,950,951,965,966,978,979,
      1004,1005,1012,1013,1014,1015,
      1016,1017,1018,1019,1020,1021,
      1022,1023],
      [],
      [0,1,2,3,4,5,6,7,8,9,10,11,18,27,
      28,29,30,31,32,33,34,35,43,44,
      45,46,47,48,49,50,51,52,53,54,
      55,56,57,58,59,60,61,62,63,64,
      65,66,67,68,74,75,76,77,78,79,
      80,81,82,83,88,89,90,91,95,96,
      97,98,99,100,101,102,115,116,
      117,118,119,120,121,122,123,124,
      125,126,127,128,129,130,131,132,
      133,134,135,136,137,138,139,140,
      141,144,145,146,147,148,149,157,
      158,159,160,161,162,163,164,165,
      166,167,168,169,170,171,172,173,
      174,175,176,177,183,184,185,186,
      187,188,189,190,191,192,193,194,
      195,198,199,200,201,202,203,204,
      205,206,207,208,209,210,211,212,
      213,214,215,216,217,238,239,240,
      241,254,255,256,257,258,259,260,
      261,262,263,264,269,270,271,272,
      273,274,275,276,277,278,279,280,
      281,282,283,284,285,286,287,292,
      293,294,295,296,297,298,299,300,
      301,302,303,304,305,306,307,308,
      316,317,318,319,320,324,325,326,
      327,328,329,330,331,332,345,346,
      356,357,358,359,360,361,368,369,
      370,371,372,373,374,375,376,377,
      378,386,387,388,389,390,391,392,
      393,394,397,398,399,400,401,402,
      403,404,405,412,413,414,415,416,
      417,418,419,420,421,422,423,424,
      425,426,427,428,429,430,431,432,
      433,434,435,436,437,438,439,440,
      441,442,443,444,445,454,455,456,
      457,458,459,460,461,462,463,464,
      465,466,467,468,469,470,471,472,
      473,474,475,476,477,478,479,480,
      481,482,483,484,485,486,487,488,
      489,490,491,492,493,494,495,496,
      497,498,499,500,501,502,512,513,
      514,515,516,525,526,527,528,529,
      530,531,532,533,534,535,536,537,
      538,539,540,541,542,543,544,545,
      546,547,548,549,550,551,552,553,
      554,555,556,557,558,559,560,561,
      562,563,564,565,566,567,568,569,
      570,571,572,573,574,575,576,577,
      578,579,580,581,582,583,584,585,
      586,587,588,589,590,591,592,593,
      594,595,596,597,598,599,600,601,
      602,605,606,607,608,609,610,611,
      612,616,617,618,619,620,621,622,
      623,624,625,626,627,628,629,630,
      631,638,639,640,641,642,643,644,
      645,646,647,648,649,650,651,652,
      653,654,655,656,657,658,659,660,
      661,662,663,664,665,666,667,668,
      669,670,671,672,673,674,675,676,
      677,678,679,680,681,682,683,684,
      685,686,687,688,689,690,691,692,
      693,694,697,698,699,700,701,702,
      703,704,705,706,707,708,709,710,
      711,712,713,714,715,716,717,718,
      719,720,721,722,723,724,725,726,
      727,728,729,730,731,732,733,734,
      735,736,737,738,739,740,741,742,
      743,744,745,746,747,751,752,753,
      754,755,756,757,758,759,760,761,
      762,763,764,765,766,767,768,769,
      770,771,772,773,774,775,776,777,
      778,779,780,781,782,783,784,785,
      786,787,788,789,790,791,792,793,
      794,795,796,797,798,799,800,801,
      802,803,804,805,806,807,808,809,
      810,814,815,816,817,818,819,820,
      821,822,823,824,825,826,827,828,
      829,830,831,832,833,834,835,836,
      837,838,839,840,841,842,849,850,
      851,852,853,855,856,857,858,859,
      860,861,862,863,864,865,866,867,
      868,869,870,871,872,873,874,875,
      877,878,879,880,881,882,883,884,
      885,886,887,888,889,890,891,892,
      893,894,895,896,897,898,899,900,
      901,902,903,904,905,906,907,908,
      909,910,911,912,913,914,915,921,
      922,923,924,925,926,933,934,935,
      936,937,938,941,942,943,944,952,
      953,954,955,956,957,958,959,960,
      961,962,963,964,967,968,969,970,
      971,972,973,974,975,976,977,980,
      981,982,983,984,985,986,987,988,
      989,990,991,992,993,994,995,996,
      997,998,999,1000,1001,1002,1003,
      1006,1007,1008,1009,1010,1011]])
      [ns_server:info,2012-08-30T13:56:20.867,ns_1@10.3.121.16:ns_port_memcached:ns_port_server:log:169]memcached<0.674.0>: Thu Aug 30 20:56:20.666298 3: TAP (Consumer) eq_tapq:anon_56 - Failed to reset a vbucket 18. Force disconnect
      memcached<0.674.0>: Thu Aug 30 20:56:20.666374 3: TAP (Consumer) eq_tapq:anon_56 - disconnected

      Link to diags of all nodes https://s3.amazonaws.com/packages.couchbase/diag-logs/large_cluster_2_0/11ndoes-1645-reb-failed-close-confirmation-error-enotconn-20120830.tgz

      Link to atop file of all nodes https://s3.amazonaws.com/packages.couchbase/atop-files/2.0.0/atop-11ndoes-1645-reb-failed-close-confirmation-error-enotconn-20120830.tgz

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            chiyoung Chiyoung Seo (Inactive)
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty