Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
3.0
-
Security Level: Public
-
None
-
CentOS
-
Untriaged
-
Unknown
-
June 30 - July 18
Description
Build
--------
3.0.0-900(xdcr on upr, internal replication on upr)
Steps
--------
1. Load on both clusters till vb_active_resident_items_ratio < 50.
2. Setup bi-xdcr on "standardbucket", uni-xdcr on "standardbucket1"
3. Access phase with 50% gets, 50%deletes for 3 hours.
4. Rebalance-out one node (.47) at C1.
5. Rebalance-in same node at C1.
Problem
-------------
During rebalance -in, right after 41.9% rebalance has not progressed(the rest call indicating progress shows no increase) for little more than 5 mins. As a result test times out as shown. This has never been the case in previous runs of the same test against 2.2.0, 2.5.0 or 2.5.1.
[2014-06-30 14:17:56,782: ERROR/MainProcess] Running Phase: rebalance_in_one_source (Rebalance-in-1)
[2014-06-30 14:18:01,901: ERROR/MainProcess] Started workload workload_37e0ed9
[2014-06-30 14:18:01,930: ERROR/MainProcess] kill task workload_19da4d6
[2014-06-30 14:18:01,931: ERROR/MainProcess]
[2014-06-30 14:18:02,005: ERROR/MainProcess] start task sent to 1 consumers
[2014-06-30 14:18:03,909: ERROR/MainProcess] Started workload workload_a2f6b8a
[2014-06-30 14:18:03,930: ERROR/MainProcess] kill task workload_f42122d
[2014-06-30 14:18:03,931: ERROR/MainProcess]
[2014-06-30 14:18:03,996: ERROR/MainProcess] start task sent to 1 consumers
[2014-06-30 14:18:05,917: ERROR/MainProcess] Started workload workload_28f856f
[2014-06-30 14:18:05,949: ERROR/MainProcess] kill task workload_640273e
[2014-06-30 14:18:05,950: ERROR/MainProcess]
[2014-06-30 14:18:06,039: ERROR/MainProcess] start task sent to 1 consumers
[2014-06-30 14:27:50,156: ERROR/MainProcess] apparently rebalance progress code in infinite loop: 41.942552351
[2014-06-30 14:27:52,158: ERROR/MainProcess] Stopping workload workload_37e0ed9
[2014-06-30 14:27:52,184: ERROR/MainProcess] kill task workload_37e0ed9
[2014-06-30 14:27:52,189: ERROR/MainProcess] Stopping workload workload_28f856f
[2014-06-30 14:27:52,226: ERROR/MainProcess] kill task workload_28f856f
[2014-06-30 14:27:54,229: ERROR/MainProcess] Stopping workload workload_a2f6b8a
[2014-06-30 14:27:54,260: ERROR/MainProcess] kill task workload_a2f6b8a
[2014-06-30 14:28:03,270: ERROR/MainProcess]
To continue testing, I'm increasing the timeout value to 15 mins. Please check if this has to do with rebalance performance.
Attaching cbcollect info.
Attachments
Issue Links
- is duplicated by
-
MB-11720 Backfilling the entire vbucket can starve other streams that also need to backfill
- Closed