Details
-
Bug
-
Resolution: Duplicate
-
Blocker
-
4.0.0
-
Security Level: Public
-
None
-
Untriaged
-
Yes
Description
Build
4.0.0-1913
Testcase
--------
./testrunner -i INI_FILE.ini -t xdcr.rebalanceXDCR.Rebalance.async_rebalance_in,items=100000,rdirection=bidirection,ctopology=chain,update=C1-C2,delete=C1-C2,rebalance=C2,num_rebalance=1,GROUP=P1
Steps
1. C1[.157,.158] <--> C2[.159,.160]
2. Load 100K keys on both clusters
3. Rebalance in .164 on C2.
4. Perform updates and deletes on both C1 and C2.
5. verify keys
[2015-04-21 14:35:02,011] - [xdcrnewbasetests:1696] INFO - Starting rebalance-in nodes:[ip:172.23.121.164 port:8091 ssh_username:root] at C2 cluster 172.23.121.159
[2015-04-21 14:35:02,431] - [task:340] INFO - adding node 172.23.121.164:8091 to cluster
:
[2015-04-21 14:35:46,558] - [task:456] INFO - rebalancing was completed with progress: 100% in 40.0777509212 sec
C1:
[2015-04-21 14:44:03,512] - [xdcrnewbasetests:1996] WARNING - Not Ready: vb_active_curr_items 149925 == 140000 expected on C1, default bucket
:
:
[2015-04-21 14:50:05,035] - [xdcrnewbasetests:2029] WARNING - Not Ready: vb_replica_curr_items 149925 == 140000 expected on C1, default bucket
C2:
[2015-04-21 14:50:11,840] - [xdcrnewbasetests:2007] INFO - Saw: vb_active_curr_items 140000 == 140000 expected on C2, default bucket
[2015-04-21 14:50:12,100] - [xdcrnewbasetests:2041] INFO - Saw: vb_replica_curr_items 140000 == 140000 expected on C2, default bucket
Some keys missing updates on C1 -
[2015-04-21 14:50:16,381] - [task:1397] ERROR - ===== Verifying rev_ids failed for key: C2-key-22535 =====
[2015-04-21 14:50:16,382] - [task:1398] ERROR - seqno mismatch: Source seqno:1, Destination seqno:2, Error Count:1
[2015-04-21 14:50:16,382] - [task:1398] ERROR - cas mismatch: Source cas:1429652041061826560, Destination cas:1429652176849993728, Error Count:2
[2015-04-21 14:50:16,382] - [task:1399] ERROR - Source meta data:
[2015-04-21 14:50:16,382] - [task:1400] ERROR - Dest meta data:
{'deleted': 0, 'seqno': 2, 'cas': 1429652176849993728, 'flags': 0, 'expiration': 0}[2015-04-21 14:50:16,390] - [task:1397] ERROR - ===== Verifying rev_ids failed for key: C2-key-6254 =====
[2015-04-21 14:50:16,390] - [task:1398] ERROR - seqno mismatch: Source seqno:1, Destination seqno:2, Error Count:3
[2015-04-21 14:50:16,390] - [task:1398] ERROR - cas mismatch: Source cas:1429652030632755200, Destination cas:1429652168016199680, Error Count:4
[2015-04-21 14:50:16,390] - [task:1399] ERROR - Source meta data:
[2015-04-21 14:50:16,391] - [task:1400] ERROR - Dest meta data:
{'deleted': 0, 'seqno': 2, 'cas': 1429652168016199680, 'flags': 0, 'expiration': 0}[2015-04-21 14:50:16,397] - [task:1397] ERROR - ===== Verifying rev_ids failed for key: C2-key-12290 =====
[2015-04-21 14:50:16,397] - [task:1398] ERROR - seqno mismatch: Source seqno:1, Destination seqno:2, Error Count:5
[2015-04-21 14:50:16,398] - [task:1398] ERROR - cas mismatch: Source cas:1429652034558558208, Destination cas:1429652171367972864, Error Count:6
[2015-04-21 14:50:16,398] - [task:1399] ERROR - Source meta data:
[2015-04-21 14:50:16,398] - [task:1400] ERROR - Dest meta data:
{'deleted': 0, 'seqno': 2, 'cas': 1429652171367972864, 'flags': 0, 'expiration': 0}[2015-04-21 14:50:16,400] - [task:1397] ERROR - ===== Verifying rev_ids failed for key: C2-key-17515 =====
[2015-04-21 14:50:16,400] - [task:1398] ERROR - seqno mismatch: Source seqno:1, Destination seqno:2, Error Count:7
[2015-04-21 14:50:16,400] - [task:1398] ERROR - cas mismatch: Source cas:1429652037954109440, Destination cas:1429652174030569472, Error Count:8
[2015-04-21 14:50:16,400] - [task:1399] ERROR - Source meta data:
[2015-04-21 14:50:16,401] - [task:1400] ERROR - Dest meta data:
{'deleted': 0, 'seqno': 2, 'cas': 1429652174030569472, 'flags': 0, 'expiration': 0}[2015-04-21 14:50:16,407] - [task:1397] ERROR - ===== Verifying rev_ids failed for key: C2-key-28580 =====
[2015-04-21 14:50:16,407] - [task:1398] ERROR - seqno mismatch: Source seqno:1, Destination seqno:2, Error Count:9
[2015-04-21 14:50:16,407] - [task:1398] ERROR - cas mismatch: Source cas:1429652044689113088, Destination cas:1429652180090945536, Error Count:10
[2015-04-21 14:50:16,407] - [task:1399] ERROR - Source meta data:
[2015-04-21 14:50:16,408] - [task:1400] ERROR - Dest meta data:
{'deleted': 0, 'seqno': 2, 'cas': 1429652180090945536, 'flags': 0, 'expiration': 0}[2015-04-21 14:50:16,432] - [task:1397] ERROR - ===== Verifying rev_ids failed for key: C2-key-24921 =====
[2015-04-21 14:50:16,432] - [task:1398] ERROR - seqno mismatch: Source seqno:1, Destination seqno:2, Error Count:11
[2015-04-21 14:50:16,432] - [task:1398] ERROR - cas mismatch: Source cas:1429652042301898752, Destination cas:1429652177913643008, Error Count:12
[2015-04-21 14:50:16,432] - [task:1399] ERROR - Source meta data:
[2015-04-21 14:50:16,433] - [task:1400] ERROR - Dest meta data: {'deleted': 0, 'seqno': 2, 'cas': 1429652177913643008, 'flags': 0, 'expiration': 0
On .164, pipeline was reconstructed 9 times in 6 mins.
PipelineManager 2015-04-21T14:44:04.381-07:00 [INFO] Replication Status = map[cb588faa571640cdaabaef8bb26bcba5/default/default:name=
{cb588faa571640cdaabaef8bb26bcba5/default/default}, status=
{Replicating}, errors={[
{"time":"2015-04-21T14:41:42.796022807-07:00","errMsg":"map[xmem_cb588faa571640cdaabaef8bb26bcba5/default/default_172.23.121.158:11210_0:Failed to resend document C2-key-98628, has tried to resend it 11, maximum retry 10 reached]"},
{"time":"2015-04-21T14:39:13.851041388-07:00","errMsg":"map[xmem_cb588faa571640cdaabaef8bb26bcba5/default/default_172.23.121.157:11210_1:Xmem is stuck]"},
{"time":"2015-04-21T14:37:58.013209682-07:00","errMsg":"map[xmem_cb588faa571640cdaabaef8bb26bcba5/default/default_172.23.121.157:11210_0:Xmem is stuck]"},
{"time":"2015-04-21T14:36:42.185226089-07:00","errMsg":"map[xmem_cb588faa571640cdaabaef8bb26bcba5/default/default_172.23.121.157:11210_0:Xmem is stuck]"},
{"time":"2015-04-21T14:35:30.585135654-07:00","errMsg":"Pipeline cb588faa571640cdaabaef8bb26bcba5/default/default failed to start"},
{"time":"2015-04-21T14:35:30.579172071-07:00","errMsg":"map[dcp_cb588faa571640cdaabaef8bb26bcba5/default/default_172.23.121.160:11210_1:NOT_MY_VBUCKET]"},
{"time":"2015-04-21T14:35:21.615506162-07:00","errMsg":"map[dcp_cb588faa571640cdaabaef8bb26bcba5/default/default_172.23.121.160:11210_1:NOT_MY_VBUCKET]"},
{"time":"2015-04-21T14:35:21.603615085-07:00","errMsg":"Pipeline cb588faa571640cdaabaef8bb26bcba5/default/default failed to start, err=map[dcp_cb588faa571640cdaabaef8bb26bcba5/default/default_172.23.121.160:11210_1:Can't move to state 2 - dcp_cb588faa571640cdaabaef8bb26bcba5/default/default_172.23.121.160:11210_1's current state is Error, can only move to state [Stopping]]\n"},
{"time":"2015-04-21T14:35:05.434284222-07:00","errMsg":"map[dcp_cb588faa571640cdaabaef8bb26bcba5/default/default_172.23.121.160:11210_1:dcp stream for vb=853 is closed by producer]"}]}, progress=
{Pipeline is running}The last pipeline error on .164 I see is around 14:41:42 but the last time C2 ever replicated to C1 is shown below. No more docs were processed after these timings.
On .164
StatisticsManager 2015-04-21T14:37:17.179-07:00 [INFO] total_doc=584, docs_processed=584, changes_left=0
.160:
StatisticsManager 2015-04-21T14:42:06.180-07:00 [INFO] total_doc=106606, docs_processed=106606, changes_left=0
On .159
StatisticsManager 2015-04-21T14:41:01.448-07:00 [INFO] total_doc=106884, docs_processed=106884, changes_left=0
Attachments
Issue Links
- duplicates
-
MB-14514 GoXDCR: Pipeline gets reconstructed frequently after rebalance-in, delaying replication
- Closed