Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-10243

rebalance stuck with progress 0 after cbrecovery stopped(rarely occurs)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 3.0
    • 2.5.0
    • ns_server
    • Security Level: Public
    • None
    • Untriaged
    • Ubuntu 64-bit

    Description

      http://qa.hq.northscale.net/job/ubuntu_x64--38_01--cbrecovery-P1/5/console
      ./testrunner -i /tmp/ubuntu-64-2.0-cbrecovery-P1.ini get-logs=True -t cbRecoverytests.cbrecovery.restart_cbrecover_multiple_failover_swapout_reb_routine,items=50000,rdirection=unidirection,ctopology=chain,failover=source,fail_count=2,add_count=2,max_verify=10000,when_step=recovery_when_rebalance

      steps:
      1) 2 clusters
      rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.1.4.31%2Cns_1%4010.3.3.141%2Cns_1%4010.1.3.71

      rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.3.233%2Cns_1%4010.1.3.72%2Cns_1%4010.3.4.25

      2)
      [2014-02-18 00:16:23,913] - [cbRecoverytests:267] INFO - Failing over 2 nodes on source ..
      [2014-02-18 00:16:24,943] - [task:2210] INFO - Failing over 10.1.4.31:8091
      [2014-02-18 00:16:25,855] - [rest_client:941] INFO - fail_over successful
      [2014-02-18 00:16:25,892] - [task:2210] INFO - Failing over 10.1.3.71:8091

      3)start and then stop cbrecovery

      [2014-02-18 00:17:04,991] - [task:2843] INFO - command was executed: '/opt/couchbase/bin/cbrecovery http://10.3.3.233:8091 http://10.3.3.141:8091 -b default -B default -u Administrator -p password -U Administrator -P password '
      [2014-02-18 00:17:25,120] - [task:2857] INFO - cbrecovery strarted with progress: {u'code': u'ok', u'uuid': u'30fd210510b2bc244fd856fc9ffa7df1', u'recoveryMap': [

      {u'node': u'ns_1@172.23.106.105', u'vbuckets': [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512]}

      ]}
      [2014-02-18 00:17:25,121] - [task:2858] INFO - will not wait for the end of the cbrecovery
      [2014-02-18 00:17:25,127] - [task:2874] INFO - cbrecovery progress: {u'code': u'ok', u'uuid': u'30fd210510b2bc244fd856fc9ffa7df1', u'recoveryMap': [

      {u'node': u'ns_1@172.23.106.105', u'vbuckets': [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512]}

      ]}
      [2014-02-18 00:17:25,138] - [rest_client:860] INFO - recovery stopped by http://10.3.3.141:8091//pools/default/buckets/default/controller/stopRecovery?recovery_uuid=30fd210510b2bc244fd856fc9ffa7df1

      4) start rebalance and then stop it:

      2014-02-18 00:17:25,239] - [rest_client:969] INFO - rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%40172.23.106.106%2Cns_1%40172.23.106.105%2Cns_1%4010.1.4.31%2Cns_1%4010.3.3.141%2Cns_1%4010.1.3.71
      [2014-02-18 00:17:25,258] - [rest_client:973] INFO - rebalance operation started
      [2014-02-18 00:17:25,266] - [rest_client:1075] INFO - rebalance percentage : 0 %
      [2014-02-18 00:17:27,291] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:29,297] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:31,314] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:33,321] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:35,329] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:37,339] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:39,355] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:41,360] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:43,374] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:45,382] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:47,390] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:49,399] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:51,411] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:53,416] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:55,427] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:57,432] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:17:59,455] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:01,469] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:03,497] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:05,512] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:07,518] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:09,523] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:11,537] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:13,547] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:15,553] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:17,558] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:19,569] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:21,575] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:23,588] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:25,600] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:27,608] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:29,618] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:31,639] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:33,647] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:35,653] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:37,660] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:39,670] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:41,677] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:43,682] - [rest_client:1075] INFO - rebalance percentage : 0.0 %
      [2014-02-18 00:18:45,685] - [rest_client:72] ERROR - rebalance progress code : 0.0
      ERROR
      FAIL: restart_cbrecover_multiple_failover_swapout_reb_routine (cbRecoverytests.cbrecovery)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
      File "pytests/cbRecoverytests.py", line 290, in restart_cbrecover_multiple_failover_swapout_reb_routine
      self.trigger_rebalance(rest, 15)
      File "pytests/cbRecoverytests.py", line 97, in trigger_rebalance
      self.assertTrue(reached, "rebalance failed or did not completed")
      AssertionError: rebalance failed or did not completed

      more than 1 min with rebalance progress 0 with almost empty bucket
      Alk, do you think that we should increase the timeout changes progress here?

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            andreibaranouski Andrei Baranouski
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty