Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60372

Intermittent rebalance failures with reason due to mover_crashed , timeout

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • 7.6.0
    • 7.6.0
    • ns_server
    •  7.6.0-2005-enterprise

    Description

      Steps:
      1. create a 3 node cluster

      +---------------+---------+----------+
      | Nodes         | Zone    | Services |
      +---------------+---------+----------+
      | 172.23.216.77 | Group 1 | kv       |
      | 172.23.216.74 | None    | kv       |  
      | 172.23.216.79 | None    | kv       |
      +---------------+---------+----------+

      2. create 3 buckets with some data load

      +---------+-------------------+----------+
      | Bucket  | Type / Storage    | Replicas | 
      +---------+-------------------+----------+
      | bucket1 | couchbase / magma | 3        |
      | bucket2 | ephemeral / -     | 3        | 
      | default | couchbase / magma | 3        | 
      +---------+-------------------+----------+

      3. rebalance - in 2 nodes 
      172.23.122.239, 172.23.109.36

      Observation
      rebalance failure observed 

      Rebalance Operation Id = 0f28c7b06d15f209dc8a25392b9d8106
      [ns_server:debug,2024-01-13T09:30:18.195+13:00,ns_1@172.23.216.77:<0.18334.0>:auto_rebalance:retry_rebalance:58]Retry rebalance is not enabled. Failed Rebalance with Id 0f28c7b06d15f209dc8a25392b9d8106 will not be retried.
      [ns_server:debug,2024-01-13T09:30:18.195+13:00,ns_1@172.23.216.77:janitor_agent-bucket1<0.21325.0>:dcp_sup:nuke:110]Nuking DCP replicators for bucket "bucket1":
      [{'ns_1@172.23.109.36',<0.7932.2>},
       {'ns_1@172.23.122.239',<0.7822.2>},
       {'ns_1@172.23.216.79',<0.21744.0>},
       {'ns_1@172.23.216.74',<0.21740.0>}]
      [ns_server:debug,2024-01-13T09:30:18.196+13:00,ns_1@172.23.216.77:<0.7933.2>:dcp_consumer_conn:handle_call:222]Shutting the connection. Partitions to close:
      [0,26,65,82,125,141,327,334,528,554,583,612,627,651,655,663,711,739,752,793,795,804,826,989,1000]
      [ns_server:debug,2024-01-13T09:30:18.196+13:00,ns_1@172.23.216.77:<0.21741.0>:dcp_consumer_conn:handle_call:222]Shutting the connection. Partitions to close:
      [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,27,28,29,30,31,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,142,143,144,145,146,147,148,149,150,151,152,153,154,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,271,273,274,275,276,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,324,325,326,328,329,330,331,332,333,335,336,337,338,339,340,341]
      [ns_server:debug,2024-01-13T09:30:18.196+13:00,ns_1@172.23.216.77:<0.7933.2>:dcp_commands:close_stream:99]Close stream for partition 0, opaque = 0x0
      [ns_server:debug,2024-01-13T09:30:18.196+13:00,ns_1@172.23.216.77:<0.7933.2>:dcp_commands:close_stream:99]Close stream for partition 26, opaque = 0x1A
      [ns_server:debug,2024-01-13T09:30:18.196+13:00,ns_1@172.23.216.77:<0.7823.2>:dcp_consumer_conn:handle_call:222]Shutting the connection. Partitions to close:
      [32,103,155,213,251,270,272,277,297,323,360,449,488,508,560,607,641,688,719,728,735,746,763,787,856,877,909,910,987,1017]
      

      later I also see  an internal server error while deleting collection

      72.23.123.73 - - [13/Jan/2024:09:32:09 +1300] "DELETE /pools/default/buckets/bucket2/scopes/_default/collections/_default HTTP/1.1" 500 44 - "python-requests/2.24.0" 15019
      ::1 - Administrator [13/Jan/2024:09:32:43 +1300] "GET /pools HTTP/1.1" 200 971 - "couchbase-cli  7.6.0-2005" 16
      

       

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pulkit.matta Pulkit Matta
            pulkit.matta Pulkit Matta
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty