Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: 3.0
Affects Version/s: 3.0
Component/s: ns_server
Security Level: Public
Labels:
None
Environment:
Build 3.0.0-662-rel

Triage:
Triaged
Operating System:
Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
[Source]

10.3.121.62 : https://s3.amazonaws.com/bugdb/jira/MB-11088/07c78a6d/10.3.121.62-592014-1436-diag.zip
10.3.2.204 : https://s3.amazonaws.com/bugdb/jira/MB-11088/683ac3ef/10.3.2.204-592014-1438-diag.zip
10.3.3.208 : https://s3.amazonaws.com/bugdb/jira/MB-11088/234d25a6/10.3.3.208-592014-1437-diag.zip
10.3.4.177 : https://s3.amazonaws.com/bugdb/jira/MB-11088/a585403b/10.3.4.177-592014-1433-diag.zip

[Destination]
10.3.121.65 : https://s3.amazonaws.com/bugdb/jira/MB-11088/958b154c/10.3.121.65-592014-1429-diag.zip -> Master node.
10.3.3.210 : https://s3.amazonaws.com/bugdb/jira/MB-11088/e54c3f2d/10.3.3.210-592014-1434-diag.zip ---> Failover and add back node
10.3.3.207 : https://s3.amazonaws.com/bugdb/jira/MB-11088/2d35da8f/10.3.3.207-592014-1432-diag.zip
10.3.3.209 : https://s3.amazonaws.com/bugdb/jira/MB-11088/4e6f78a0/10.3.3.209-592014-1430-diag.zip

Show
[Source] 10.3.121.62 : https://s3.amazonaws.com/bugdb/jira/MB-11088/07c78a6d/10.3.121.62-592014-1436-diag.zip 10.3.2.204 : https://s3.amazonaws.com/bugdb/jira/MB-11088/683ac3ef/10.3.2.204-592014-1438-diag.zip 10.3.3.208 : https://s3.amazonaws.com/bugdb/jira/MB-11088/234d25a6/10.3.3.208-592014-1437-diag.zip 10.3.4.177 : https://s3.amazonaws.com/bugdb/jira/MB-11088/a585403b/10.3.4.177-592014-1433-diag.zip [Destination] 10.3.121.65 : https://s3.amazonaws.com/bugdb/jira/MB-11088/958b154c/10.3.121.65-592014-1429-diag.zip -> Master node. 10.3.3.210 : https://s3.amazonaws.com/bugdb/jira/MB-11088/e54c3f2d/10.3.3.210-592014-1434-diag.zip ---> Failover and add back node 10.3.3.207 : https://s3.amazonaws.com/bugdb/jira/MB-11088/2d35da8f/10.3.3.207-592014-1432-diag.zip 10.3.3.209 : https://s3.amazonaws.com/bugdb/jira/MB-11088/4e6f78a0/10.3.3.209-592014-1430-diag.zip
Is this a Regression?:
Unknown

Description

Rebalance operation is hanged after add back a failovered node. Test is always failing.

http://qa.hq.northscale.net/job/centos_x64--31_01--uniXDCR-P1/1/consoleFull

[Test Case]
./testrunner ~~i centos_x6431_01~~-uniXDCR-P1.ini GROUP=CHAIN,get-cbcollect-info=True,get-logs=False,stop-on-failure=False -t xdcr.uniXDCR.unidirectional.load_with_failover_then_add_back,items=100000,rdirection=unidirection,ctopology=chain,doc-ops=update-delete,failover=destination,GROUP=CHAIN;P1

[2014-05-09 14:13:27,247] - [uniXDCR:189] INFO - Failing over Destination Non-Master Node 10.3.3.210:8091
[2014-05-09 14:13:28,544] - [task:2229] INFO - Failing over 10.3.3.210:8091
[2014-05-09 14:13:28,971] - [rest_client:1029] INFO - fail_over node ns_1@10.3.3.210 successful
[2014-05-09 14:13:28,973] - [task:2209] INFO - 20 seconds sleep after failover, for nodes to go pending....
[2014-05-09 14:13:48,994] - [uniXDCR:192] INFO - Add back Destination Non-Master Node 10.3.3.210:8091
[2014-05-09 14:13:49,386] - [rest_client:1062] INFO - add_back_node ns_1@10.3.3.210 successful
[2014-05-09 14:13:50,563] - [rest_client:1076] INFO - rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%4010.3.121.65%2Cns_1%4010.3.3.210%2Cns_1%4010.3.3.209%2Cns_1%4010.3.3.207
[2014-05-09 14:13:50,691] - [rest_client:1080] INFO - rebalance operation started
[2014-05-09 14:13:50,969] - [rest_client:1181] INFO - rebalance percentage : 0 %
[2014-05-09 14:14:01,213] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:14:11,365] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:14:21,751] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:14:32,462] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:14:42,777] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:14:53,214] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:15:04,233] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:15:15,050] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:15:25,177] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:15:35,811] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:15:46,200] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:15:56,661] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:16:07,376] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:16:17,740] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:16:28,131] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:16:38,672] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:16:48,878] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:16:59,706] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %
[2014-05-09 14:17:10,547] - [rest_client:1181] INFO - rebalance percentage : 14.4747759205 %

1. Setup 4 -4 Node Source and Destination cluster.
2. Load 1 M items on source side.
3. Failover non-master node at destination.
4. add back node.
5. Rebalance. -> Rebalance stuck. Issue is always reproducible with 662 build.

[Note] ->
1. XDCR was non-UPR in this case. Only intra-cluster replication was using UPR.
2. Issue is occurring with large number of items i.e. 1M, test is passed with lesser items e.g. 1K, 10K or so.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Mike Wiederhold [X] (Inactive)

Reporter:: Sangharsh Agarwal

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/May/14 9:20 AM

Updated:: 19/Aug/14 3:52 PM

Resolved:: 12/May/14 3:18 PM

Gerrit Reviews

There are no open Gerrit changes

[UPR] Rebalance hang after add back node

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty