Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59964

Rebalance stuck when shard based rebalance flag is toggled

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      The test does the following -

      Create a cluster of 6 nodes ( 1 KV+ 5 GSI/Query)
      Disable shard based rebalance.
      Create buckets/scopes/collections/indexes.
      Trigger rebalance out of one node after another i.e 2 rebalances. So we now have a 4-node cluster ( 1 KV+ 3 GSI/Query).
      Enable shard-based rebalance.
      Rebalance 1 index node after another.
      First rebalance succeeds. Second one seems to be stuck according to the test log. It's stuck at 28% for about 5 mins and then the test is marked failed.

      [2023-12-04 10:27:11,127] - [on_prem_rest_client:1931] INFO - rebalance operation started
      [2023-12-04 10:27:12,072] - [on_prem_rest_client:2095] INFO - rebalance percentage : 16.00 %
      [2023-12-04 10:27:12,072] - [gsi_file_based_rebalance:1188] INFO - Rebalance has started running.
      [2023-12-04 10:27:21,163] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:21,163] - [task:899] INFO - Rebalance - status: running, progress: 25.00%
      [2023-12-04 10:27:22,110] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:25,148] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:28,176] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:31,204] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:34,232] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:37,265] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:40,319] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:41,239] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:41,239] - [task:899] INFO - Rebalance - status: running, progress: 25.00%
      [2023-12-04 10:27:43,349] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:46,404] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:49,432] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:52,459] - [on_prem_rest_client:2095] INFO - rebalance percentage : 75.00 %
      [2023-12-04 10:27:55,486] - [on_prem_rest_client:2095] INFO - rebalance percentage : 25.00 %
      [2023-12-04 10:27:58,513] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:01,315] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:01,315] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:28:01,541] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:04,567] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:07,595] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:10,622] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:13,652] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:16,679] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:19,706] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:21,392] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:21,392] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:28:22,733] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:25,761] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:28,788] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:31,815] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:34,842] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:37,870] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:40,901] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:41,467] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:41,467] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:28:43,929] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:46,956] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:49,982] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:53,009] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:56,035] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:28:59,063] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:01,541] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:01,541] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:29:02,088] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:05,115] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:08,142] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:11,171] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:14,198] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:17,225] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:20,248] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:21,615] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:21,615] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:29:23,278] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:26,307] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:29,336] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:32,364] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:35,392] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:38,420] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:41,448] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:41,689] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:41,690] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:29:44,480] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:47,504] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:50,532] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:53,561] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:56,589] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:29:59,624] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:01,760] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:01,760] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:30:02,651] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:05,684] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:08,715] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:11,747] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:14,774] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:17,801] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:20,829] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:21,839] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:21,839] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:30:23,860] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:26,890] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:29,918] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:32,948] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:35,978] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:39,006] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:41,913] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:41,914] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:30:42,034] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:45,070] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:48,098] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:51,125] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:54,151] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:30:57,180] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:00,206] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:01,987] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:01,988] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:31:03,232] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:06,262] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:09,294] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:12,333] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:15,360] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:18,388] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:21,417] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:22,062] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:22,062] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:31:24,445] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:27,472] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:30,500] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:33,528] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:36,553] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:39,580] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:42,132] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:42,133] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:31:42,611] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:45,638] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:48,669] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:51,697] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:54,726] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:31:57,751] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:32:00,778] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:32:02,203] - [on_prem_rest_client:2095] INFO - rebalance percentage : 28.00 %
      [2023-12-04 10:32:02,203] - [task:899] INFO - Rebalance - status: running, progress: 28.00%
      [2023-12-04 10:32:03,781] - [on_prem_rest_client:120] ERROR - rebalance stuck on 28.0%
      

      Rebalance report in question -

      {"stageInfo":{"index":{"totalProgress":20.625,"perNodeProgress":{"ns_1@172.23.106.11":0.20625,"ns_1@172.23.216.162":0.20625,"ns_1@172.23.106.15":0.20625,"ns_1@172.23.216.86":0.20625,"ns_1@172.23.106.106":0.20625},"startTime":"2023-12-04T18:27:11.820Z","completedTime":false,"timeTaken":347477},"data":{"totalProgress":100,"perNodeProgress":{"ns_1@172.23.106.18":1},"startTime":"2023-12-04T18:27:11.109Z","completedTime":"2023-12-04T18:27:11.805Z","timeTaken":696},"query":{"totalProgress":100,"perNodeProgress":{"ns_1@172.23.216.162":1,"ns_1@172.23.106.15":1,"ns_1@172.23.216.86":1},"startTime":"2023-12-04T18:27:11.805Z","completedTime":"2023-12-04T18:27:11.820Z","timeTaken":14}},"rebalanceId":"1f32259dbb2f16db66365d7d2f73d2c1","nodesInfo":{"active_nodes":["ns_1@172.23.106.106","ns_1@172.23.106.11","ns_1@172.23.106.15","ns_1@172.23.106.18","ns_1@172.23.216.162","ns_1@172.23.216.86"],"keep_nodes":["ns_1@172.23.106.106","ns_1@172.23.106.11","ns_1@172.23.106.15","ns_1@172.23.106.18","ns_1@172.23.216.162","ns_1@172.23.216.86"],"eject_nodes":[],"delta_nodes":[],"failed_nodes":[]},"masterNode":"ns_1@172.23.106.106","startTime":"2023-12-04T18:27:11.075Z","completedTime":"2023-12-04T18:32:59.297Z","timeTaken":348221,"completionMessage":"Rebalance stopped by user."}
      

      The report says "stopped by user" because the test waits for 5 mins and then exits the loop. cbcollect attached.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              pavan.pb Pavan PB
              pavan.pb Pavan PB
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty