Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7169

missed items after rebalance out 2 failover nodes with replica=2 in 5 nodes cluster

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 2.0
    • 2.0
    • couchbase-bucket, ns_server
    • Security Level: Public
    • None

    Description

      version=2.0.0-1949-rel
      http://qa.hq.northscale.net/job/centos-64-2.0-failover-tests/448/consoleFull
      testrunner -i resources/jenkins/centos-64-5node-failover.ini get-logs=True,GROUP=BAT -t failovertests.FailoverTests.test_failover_firewall,replica=2,keys_count=20000,GROUP=BAT

      steps:
      1. 5 nodes in cluster with default bucket replica=2,keys_count=20000
      3. failover 2 nodes

      [2012-11-13 02:31:42,760] - [rest_client:849] INFO - fail_over successful
      [2012-11-13 02:31:42,761] - [failovertests:278] INFO - failed over node : ns_1@10.1.3.117
      [2012-11-13 02:31:43,328] - [rest_client:849] INFO - fail_over successful
      [2012-11-13 02:31:43,328] - [failovertests:278] INFO - failed over node : ns_1@10.1.3.115

      2012-11-12 11:14:51,928 - root - INFO - node 10.1.3.117:8091 is 'unhealthy' as expected
      2012-11-12 11:15:22,229 - root - INFO - fail_over successful
      2012-11-12 11:15:22,229 - root - INFO - failed over node : ns_1@10.1.3.117

      2012-11-12 11:15:26,041 - root - INFO - node 10.1.3.115:8091 is 'unhealthy' as expected
      2012-11-12 11:15:31,055 - root - INFO - fail_over successful
      2012-11-12 11:15:31,055 - root - INFO - failed over node : ns_1@10.1.3.115

      3. rebalance out failover nodes

      [2012-11-13 02:31:43,328] - [failovertests:292] INFO - 10 seconds sleep after failover before invoking rebalance...
      [2012-11-13 02:31:53,328] - [rest_client:883] INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.1.3.117%2Cns_1%4010.1.3.115&user=Administrator&knownNodes=ns_1%4010.1.3.114%2Cns_1%4010.1.3.117%2Cns_1%4010.1.3.115%2Cns_1%4010.1.3.118%2Cns_1%4010.1.3.116
      result:
      rebalance passed successful but after progress 59 % it completed instantly

      [2012-11-13 02:31:53,337] - [rest_client:890] INFO - rebalance operation started
      [2012-11-13 02:31:53,347] - [rest_client:986] INFO - rebalance percentage : 0 %
      [2012-11-13 02:31:55,376] - [rest_client:986] INFO - rebalance percentage : 0.415885290979 %
      [2012-11-13 02:31:57,385] - [rest_client:986] INFO - rebalance percentage : 0.966587084853 %
      [2012-11-13 02:31:59,394] - [rest_client:986] INFO - rebalance percentage : 1.57911268089 %
      [2012-11-13 02:32:01,401] - [rest_client:986] INFO - rebalance percentage : 2.05668833217 %
      [2012-11-13 02:32:03,417] - [rest_client:986] INFO - rebalance percentage : 2.6692139282 %
      [2012-11-13 02:32:05,421] - [rest_client:986] INFO - rebalance percentage : 3.08496577732 %
      [2012-11-13 02:32:07,437] - [rest_client:986] INFO - rebalance percentage : 3.41655602713 %.......
      ............
      [2012-11-13 02:39:22,147] - [rest_client:986] INFO - rebalance percentage : 58.7635239567 %
      [2012-11-13 02:39:24,159] - [rest_client:986] INFO - rebalance percentage : 59.0726429675 %
      [2012-11-13 02:39:26,205] - [rest_client:986] INFO - rebalance percentage : 59.3199381762 %
      [2012-11-13 02:39:28,212] - [rest_client:986] INFO - rebalance percentage : 59.5672333849 %
      [2012-11-13 02:39:30,226] - [rest_client:986] INFO - rebalance percentage : 59.8145285935 %
      [2012-11-13 02:39:34,237] - [rest_client:943] INFO - rebalance progress took 460.899084091 seconds
      [2012-11-13 02:39:34,237] - [rest_client:944] INFO - sleep for 10 seconds after rebalance...

      then verification part:

      [2012-11-13 02:39:44,236] - [failovertests:304] INFO - Begin VERIFICATION ...
      [2012-11-13 02:39:45,146] - [data_helper:289] INFO - creating direct client 10.1.3.114:11210 default
      [2012-11-13 02:39:45,591] - [data_helper:289] INFO - creating direct client 10.1.3.116:11210 default
      [2012-11-13 02:39:46,002] - [data_helper:289] INFO - creating direct client 10.1.3.118:11210 default
      [2012-11-13 02:39:46,420] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:39:46,446] - [data_helper:289] INFO - creating direct client 10.1.3.114:11210 default
      [2012-11-13 02:39:46,969] - [data_helper:289] INFO - creating direct client 10.1.3.116:11210 default
      [2012-11-13 02:39:47,409] - [data_helper:289] INFO - creating direct client 10.1.3.118:11210 default
      [2012-11-13 02:39:47,793] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:39:47,820] - [data_helper:289] INFO - creating direct client 10.1.3.114:11210 default
      [2012-11-13 02:39:48,255] - [data_helper:289] INFO - creating direct client 10.1.3.116:11210 default
      [2012-11-13 02:39:48,676] - [data_helper:289] INFO - creating direct client 10.1.3.118:11210 default
      [2012-11-13 02:39:49,084] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:39:49,108] - [data_helper:289] INFO - creating direct client 10.1.3.114:11210 default
      [2012-11-13 02:39:49,520] - [data_helper:289] INFO - creating direct client 10.1.3.116:11210 default
      [2012-11-13 02:39:49,939] - [data_helper:289] INFO - creating direct client 10.1.3.118:11210 default
      [2012-11-13 02:39:50,364] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:39:52,404] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:39:53,433] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:39:54,461] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:39:55,489] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:39:57,518] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:39:58,549] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:39:59,581] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:00,610] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:02,638] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:03,670] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:04,702] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:05,732] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:07,760] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:08,789] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:09,818] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:10,851] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:12,880] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:13,907] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:14,937] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:15,968] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:17,998] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:19,032] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:20,062] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:21,096] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:23,124] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:24,154] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:25,186] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:26,215] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:28,243] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:29,272] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:30,301] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:31,329] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:33,359] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:34,389] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:35,419] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:36,446] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:38,474] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:39,502] - [task:329] WARNING - Not Ready: vb_active_curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:40,531] - [task:329] WARNING - Not Ready: vb_replica_curr_items 36034 == 40000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:41,564] - [task:329] WARNING - Not Ready: curr_items_tot 54051 == 60000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      [2012-11-13 02:40:43,594] - [task:329] WARNING - Not Ready: curr_items 18017 == 20000 expected on '10.1.3.114:8091''10.1.3.116:8091''10.1.3.118:8091'
      this job didn't get cbcollect_info, will add it

      Alk, my question is:
      is it possible that the rebalance is over at the end so fast?

      If you do not see any problem here, assign ticket back to me or Abhinav that we could provide cbcollect_info next time for bucket-engine team

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            andreibaranouski Andrei Baranouski
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty