Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11602

KV+XDCR System test : Rebalance gets temporarily stuck but eventually proceeds to completion

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 3.0
    • 3.0
    • couchbase-bucket
    • Security Level: Public
    • None
    • CentOS
    • Untriaged
    • Unknown
    • June 30 - July 18

    Description

      Build
      --------
      3.0.0-900(xdcr on upr, internal replication on upr)

      Steps
      --------
      1. Load on both clusters till vb_active_resident_items_ratio < 50.
      2. Setup bi-xdcr on "standardbucket", uni-xdcr on "standardbucket1"
      3. Access phase with 50% gets, 50%deletes for 3 hours.
      4. Rebalance-out one node (.47) at C1.
      5. Rebalance-in same node at C1.

      Problem
      -------------
      During rebalance -in, right after 41.9% rebalance has not progressed(the rest call indicating progress shows no increase) for little more than 5 mins. As a result test times out as shown. This has never been the case in previous runs of the same test against 2.2.0, 2.5.0 or 2.5.1.

      [2014-06-30 14:17:56,782: ERROR/MainProcess] Running Phase: rebalance_in_one_source (Rebalance-in-1)
      [2014-06-30 14:18:01,901: ERROR/MainProcess] Started workload workload_37e0ed9
      [2014-06-30 14:18:01,930: ERROR/MainProcess] kill task workload_19da4d6
      [2014-06-30 14:18:01,931: ERROR/MainProcess]

      {'update_perc': 22, 'indexed_keys': [], 'del_perc': 3, 'postcondition_handler': None, 'create_perc': 3, 'bucket': 'standardbucket', 'exp_perc': 2, 'miss_queue': None, 'ops_per_sec': 3000, 'consume_queue': None, 'postconditions': None, 'template': 'default', 'ttl': 3000, 'cc_queues': ['std1ph5keys'], 'preconditions': None, 'password': '', 'get_perc': 70, 'miss_perc': 5, 'wait': None}

      [2014-06-30 14:18:02,005: ERROR/MainProcess] start task sent to 1 consumers
      [2014-06-30 14:18:03,909: ERROR/MainProcess] Started workload workload_a2f6b8a
      [2014-06-30 14:18:03,930: ERROR/MainProcess] kill task workload_f42122d
      [2014-06-30 14:18:03,931: ERROR/MainProcess]

      {'update_perc': 22, 'indexed_keys': [], 'del_perc': 3, 'postcondition_handler': None, 'create_perc': 3, 'bucket': 'standardbucket1', 'exp_perc': 2, 'miss_queue': None, 'ops_per_sec': 3000, 'consume_queue': None, 'postconditions': None, 'template': 'default', 'ttl': 3000, 'cc_queues': ['std2ph5keys'], 'preconditions': None, 'password': '', 'get_perc': 70, 'miss_perc': 5, 'wait': None}

      [2014-06-30 14:18:03,996: ERROR/MainProcess] start task sent to 1 consumers
      [2014-06-30 14:18:05,917: ERROR/MainProcess] Started workload workload_28f856f
      [2014-06-30 14:18:05,949: ERROR/MainProcess] kill task workload_640273e
      [2014-06-30 14:18:05,950: ERROR/MainProcess]

      {'update_perc': 22, 'indexed_keys': [], 'del_perc': 3, 'postcondition_handler': None, 'create_perc': 3, 'bucket': 'saslbucket', 'exp_perc': 2, 'miss_queue': None, 'ops_per_sec': 3000, 'consume_queue': None, 'postconditions': None, 'template': 'default', 'ttl': 3000, 'cc_queues': ['saslph5keys'], 'preconditions': None, 'password': 'password', 'get_perc': 70, 'miss_perc': 5, 'wait': None}

      [2014-06-30 14:18:06,039: ERROR/MainProcess] start task sent to 1 consumers
      [2014-06-30 14:27:50,156: ERROR/MainProcess] apparently rebalance progress code in infinite loop: 41.942552351
      [2014-06-30 14:27:52,158: ERROR/MainProcess] Stopping workload workload_37e0ed9
      [2014-06-30 14:27:52,184: ERROR/MainProcess] kill task workload_37e0ed9
      [2014-06-30 14:27:52,189: ERROR/MainProcess] Stopping workload workload_28f856f
      [2014-06-30 14:27:52,226: ERROR/MainProcess] kill task workload_28f856f
      [2014-06-30 14:27:54,229: ERROR/MainProcess] Stopping workload workload_a2f6b8a
      [2014-06-30 14:27:54,260: ERROR/MainProcess] kill task workload_a2f6b8a
      [2014-06-30 14:28:03,270: ERROR/MainProcess]

      To continue testing, I'm increasing the timeout value to 15 mins. Please check if this has to do with rebalance performance.

      Attaching cbcollect info.

      Attachments

        1. masterEvents
          2.19 MB
        2. masterEvents.txt
          19.63 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              apiravi Aruna Piravi (Inactive)
              apiravi Aruna Piravi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty