Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6987

curr_items_tot does not correspond to sum of curr_items from nodes after swap rebalance of 2 nodes with data load

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 2.0.1
    • None
    • ns_server, storage-engine
    • Security Level: Public
    • None

    Description

      version=2.0.0-1884-rel
      http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions/90/consoleFull
      ./testrunner -i /tmp/rebalance_regression.ini get-logs=True -t swaprebalance.SwapRebalanceFailedTests.test_failed_swap_rebalance,replica=2,num-buckets=1,num-swap=2,swap-orchestrator=False

      2012-10-23 01:44:52,802 - root - INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.3.121.94%2Cns_1%4010.3.121.93&user=Administrator&knownNodes=ns_1%4010.3.121.93%2Cns_1%4010.3.121.94%2Cns_1%4010.3.121.92%2Cns_1%4010.3.121.95%2Cns_1%4010.3.121.96
      ...

      2012-10-23 01:57:41,129 - root - INFO - curr_items from 10.3.121.95:8091 : 32540
      2012-10-23 01:57:41,130 - root - INFO - vb_pending_num from 10.3.121.95:8091 : 0
      2012-10-23 01:57:41,131 - root - INFO - vb_active_num from 10.3.121.95:8091 : 341
      2012-10-23 01:57:41,131 - root - INFO - vb_replica_num from 10.3.121.95:8091 : 683
      2012-10-23 01:57:41,131 - root - INFO - curr_items from 10.3.121.92:8091 : 32490
      2012-10-23 01:57:41,132 - root - INFO - vb_pending_num from 10.3.121.92:8091 : 0
      2012-10-23 01:57:41,132 - root - INFO - vb_active_num from 10.3.121.92:8091 : 342
      2012-10-23 01:57:41,133 - root - INFO - vb_replica_num from 10.3.121.92:8091 : 682
      2012-10-23 01:57:41,134 - root - INFO - curr_items from 10.3.121.96:8091 : 32704
      2012-10-23 01:57:41,134 - root - INFO - vb_pending_num from 10.3.121.96:8091 : 0
      2012-10-23 01:57:41,134 - root - INFO - vb_active_num from 10.3.121.96:8091 : 341
      2012-10-23 01:57:41,135 - root - INFO - vb_replica_num from 10.3.121.96:8091 : 683
      2012-10-23 01:57:41,135 - root - INFO - summation of vb_active_num : 1024 vb_pending_num : 0 vb_replica_num : 2048
      2012-10-23 01:57:41,136 - root - INFO - sum : 97734 and sum * (replica_factor + 1) (3) : 293202
      2012-10-23 01:57:41,483 - root - INFO - master_stats : 293215
      2012-10-23 01:57:41,484 - root - INFO - delta : 13 missing_percentage : 4.43380331648e-05 replica_factor : 2
      2012-10-23 01:57:43,570 - root - INFO - creating direct client 10.3.121.95:11210 default
      2012-10-23 01:57:43,975 - root - INFO - getting tap stats.. for 10.3.121.95
      2012-10-23 01:57:43,983 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.92:ack_window_full false
      2012-10-23 01:57:43,984 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.92:has_queued_item false
      2012-10-23 01:57:43,984 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.92:ack_log_size 0
      2012-10-23 01:57:43,984 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.96:pending_backfill false
      2012-10-23 01:57:43,984 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.96:ack_seqno 52343
      2012-10-23 01:57:43,985 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.92:paused 1
      2012-10-23 01:57:43,985 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.92:idle true
      2012-10-23 01:57:43,986 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.92:ack_seqno 41264
      2012-10-23 01:57:43,986 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.96:ack_log_size 0
      2012-10-23 01:57:43,986 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.96:has_queued_item false
      2012-10-23 01:57:43,987 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.96:idle true
      2012-10-23 01:57:43,987 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.96:pending_disk_backfill false
      2012-10-23 01:57:43,988 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.92:pending_disk_backfill false
      2012-10-23 01:57:43,988 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.96:ack_window_full false
      2012-10-23 01:57:43,988 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.96:recv_ack_seqno 52342
      2012-10-23 01:57:43,989 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.92:recv_ack_seqno 41263
      2012-10-23 01:57:43,989 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.96:paused 1
      2012-10-23 01:57:43,989 - root - INFO - TAP ns_1@10.3.121.95 :eq_tapq:replication_ns_1@10.3.121.92:pending_backfill false
      2012-10-23 01:57:44,018 - root - INFO - creating direct client 10.3.121.96:11210 default
      2012-10-23 01:57:44,430 - root - INFO - getting tap stats.. for 10.3.121.96
      2012-10-23 01:57:44,438 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.92:ack_window_full false
      2012-10-23 01:57:44,438 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.92:has_queued_item false
      2012-10-23 01:57:44,439 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.95:ack_log_size 0
      2012-10-23 01:57:44,439 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.92:ack_log_size 0
      2012-10-23 01:57:44,440 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.92:paused 1
      2012-10-23 01:57:44,440 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.95:ack_window_full false
      2012-10-23 01:57:44,440 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.92:idle true
      2012-10-23 01:57:44,441 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.92:ack_seqno 40878
      2012-10-23 01:57:44,441 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.95:paused 1
      2012-10-23 01:57:44,442 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.92:pending_disk_backfill false
      2012-10-23 01:57:44,442 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.95:ack_seqno 82461
      2012-10-23 01:57:44,442 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.95:has_queued_item false
      2012-10-23 01:57:44,443 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.95:idle true
      2012-10-23 01:57:44,443 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.95:pending_backfill false
      2012-10-23 01:57:44,444 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.92:recv_ack_seqno 40877
      2012-10-23 01:57:44,444 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.95:pending_disk_backfill false
      2012-10-23 01:57:44,445 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.92:pending_backfill false
      2012-10-23 01:57:44,445 - root - INFO - TAP ns_1@10.3.121.96 :eq_tapq:replication_ns_1@10.3.121.95:recv_ack_seqno 82460
      2012-10-23 01:57:44,475 - root - INFO - creating direct client 10.3.121.92:11210 default
      2012-10-23 01:57:44,886 - root - INFO - getting tap stats.. for 10.3.121.92
      2012-10-23 01:57:44,894 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.95:ack_log_size 0
      2012-10-23 01:57:44,894 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.96:pending_backfill false
      2012-10-23 01:57:44,895 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.95:pending_backfill false
      2012-10-23 01:57:44,895 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.96:ack_seqno 59391
      2012-10-23 01:57:44,896 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.95:ack_window_full false
      2012-10-23 01:57:44,897 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.96:ack_log_size 0
      2012-10-23 01:57:44,898 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.96:has_queued_item false
      2012-10-23 01:57:44,898 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.96:idle true
      2012-10-23 01:57:44,899 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.95:paused 1
      2012-10-23 01:57:44,899 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.96:pending_disk_backfill false
      2012-10-23 01:57:44,899 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.95:ack_seqno 59944
      2012-10-23 01:57:44,900 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.95:has_queued_item false
      2012-10-23 01:57:44,900 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.95:idle true
      2012-10-23 01:57:44,900 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.96:ack_window_full false
      2012-10-23 01:57:44,901 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.96:recv_ack_seqno 59390
      2012-10-23 01:57:44,901 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.95:recv_ack_seqno 59943
      2012-10-23 01:57:44,902 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.96:paused 1
      2012-10-23 01:57:44,902 - root - INFO - TAP ns_1@10.3.121.92 :eq_tapq:replication_ns_1@10.3.121.95:pending_disk_backfill false
      FAIL
      ======================================================================
      FAIL: test_failed_swap_rebalance (swaprebalance.SwapRebalanceFailedTests)
      ----------------------------------------------------------------------
      Traceback (most recent call last):
      File "pytests/swaprebalance.py", line 597, in test_failed_swap_rebalance
      SwapRebalanceBase._common_test_body_failed_swap_rebalance(self)
      File "pytests/swaprebalance.py", line 394, in _common_test_body_failed_swap_rebalance
      SwapRebalanceBase.items_verification(master, self)
      File "pytests/swaprebalance.py", line 148, in items_verification
      test.assertTrue(verified, "Lost items!!.. failing test in

      {0}

      secs".format(timeout))
      AssertionError: Lost items!!.. failing test in 600 secs

      [error_logger:error,2012-10-23T1:46:34.206,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:init/1
      pid: <0.15957.3>
      registered_name: []
      exception exit: downstream_closed
      in function gen_server:terminate/6
      ancestors: ['ns_vbm_new_sup-default','single_bucket_sup-default',
      <0.780.0>]
      messages: []
      links: Port<0.61066>,<0.806.0>,<0.15990.3>,#Port<0.61062>
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 4181
      stack_size: 24
      reductions: 129203
      neighbours:

      [error_logger:error,2012-10-23T1:46:34.207,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_msg:76]** Generic server <0.15980.3> terminating

        • Last message in was {tcp_closed,#Port<0.61086>}
        • When Server state == {state,#Port<0.61079>,#Port<0.61060>,#Port<0.61086>,
          #Port<0.61065>,<0.15991.3>,<<>>,<<>>,
          {set,77,16,16,8,80,48,
          {[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]}

          ,
          {{[670,654,638,622,606],
          [673,657,641,625,609],
          [676,660,644,628,612],
          [679,663,647,631,615],
          [682,666,650,634,618],
          [669,653,637,621],
          [672,656,640,624,608],
          [675,659,643,627,611],
          [678,662,646,630,614],
          [681,665,649,633,617],
          [668,652,636,620],
          [671,655,639,623,607],
          [674,658,642,626,610],
          [677,661,645,629,613],
          [680,664,648,632,616],
          [667,651,635,619]}}},
          -1,false,false,0,

          {1350,981993,591548}

          ,
          not_started,undefined,
          <<"replication_ns_1@10.3.121.92">>,<0.15980.3>,

          {had_backfill,false,undefined,[]}

          ,
          false}

        • Reason for termination ==
        • downstream_closed

      [error_logger:error,2012-10-23T1:46:34.213,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:init/1
      pid: <0.15980.3>
      registered_name: []
      exception exit: downstream_closed
      in function gen_server:terminate/6
      ancestors: ['ns_vbm_new_sup-default','single_bucket_sup-default',
      <0.780.0>]
      messages: []
      links: Port<0.61065>,<0.806.0>,<0.15991.3>,#Port<0.61060>
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 4181
      stack_size: 24
      reductions: 38360
      neighbours:

      [error_logger:error,2012-10-23T1:46:34.213,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
      =========================SUPERVISOR REPORT=========================
      Supervisor:

      {local,'ns_vbm_new_sup-default'}
      Context: child_terminated
      Reason: downstream_closed
      Offender: [{pid,<0.15957.3>},
      {name,{new_child_id,[513,514,515,516,517,518,519,520,521,
      522,523,524,525,526,527,528,529,530,
      531,532,533,534,535,536,537,538,539,
      540,541,542,543,544,545,546,547,548,
      549,550,551,552,553,554,555,556,557,
      558,559,560,561,562,563,564,565,566,
      567,568,569,570,571,572,573,574,575,
      576,577,578,579,580,581,582,583,584,
      585,586,587,588,589,590,591,592,593,
      594,595,596,597,598,599,600,601,602,
      603,604,683,684,685,686,687,688,689,
      690,691,692,693,694,695,696,697,698,
      699,700,701,702,703,704,705,706,707,
      708,709,710,711,712,713,714,715,716,
      717,718,719,720,721,722,723,724,725,
      726,727,728,729,730,731,732,733,734,
      735,736,737,738,739,740,741,742,743,
      744,745,746,747,748,749,750,751,752,
      753,754,755,756,757,758,759,760,761,
      762,763,764,765,766,767,768,769,770,
      771,772,773,774,775,776,777,778,779,
      780,781,782,783,784,785,786,787,788,
      789,790,791,792,793,794,795,796,797,
      798,799,800,801,802,803,804,805,806,
      807,808,809,810,811,812,813,814,815,
      816,817,818,819,820,821,822,823,824,
      825,826,827,828,829,830,831,832,833,
      834,835,836,837,838,839,840,841,842,
      843,844,845,846,847,848,849,850,851,
      852,853],
      'ns_1@10.3.121.95'}},
      {mfargs,{ebucketmigrator_srv,start_link,undefined}},
      {restart_type,temporary},
      {shutdown,60000},
      {child_type,worker}]


      [ns_server:info,2012-10-23T1:46:34.219,ns_1@10.3.121.92:<0.15981.3>:ns_replicas_builder_utils:kill_a_bunch_of_tap_names:59]Killed the following tap names on 'ns_1@10.3.121.93': [<<"replication_building_605_'ns_1@10.3.121.95'">>,
      <<"replication_building_605_'ns_1@10.3.121.92'">>]
      [rebalance:info,2012-10-23T1:46:34.222,ns_1@10.3.121.92:<0.23130.2>:janitor_agent:bulk_set_vbucket_state:386]Doing bulk vbucket 605 state change
      [{'ns_1@10.3.121.94',replica,undefined,undefined},
      {'ns_1@10.3.121.95',replica,undefined,'ns_1@10.3.121.96'},
      {'ns_1@10.3.121.92',replica,undefined,'ns_1@10.3.121.95'}]
      [error_logger:error,2012-10-23T1:46:34.222,ns_1@10.3.121.92:error_logger:ale_error_logger_handler:log_report:72]
      =========================SUPERVISOR REPORT=========================
      Supervisor: {local,'ns_vbm_new_sup-default'}

      Context: child_terminated
      Reason: downstream_closed
      Offender: [

      {pid,<0.15980.3>}

      ,
      {name,{new_child_id,[606,607,608,609,610,611,612,613,614,
      615,616,617,618,619,620,621,622,623,
      624,625,626,627,628,629,630,631,632,
      633,634,635,636,637,638,639,640,641,
      642,643,644,645,646,647,648,649,650,
      651,652,653,654,655,656,657,658,659,
      660,661,662,663,664,665,666,667,668,
      669,670,671,672,673,674,675,676,677,
      678,679,680,681,682],
      'ns_1@10.3.121.94'}},
      {mfargs,{ebucketmigrator_srv,start_link,undefined}},

      {restart_type,temporary}

      ,

      {shutdown,60000}

      ,

      {child_type,worker}

      ]

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            andreibaranouski Andrei Baranouski
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty