Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 3.0
Affects Version/s: 3.0
Component/s: XDCR
Security Level: Public
Labels:
None

Triage:
Untriaged
Is this a Regression?:
No

Description

Scenario
-------------

Uni-xdcr between 2 one node clusters. Load an item onto vb449. 1 checkpoint is recorded as is as follows -

{"commitopaque":[158975596682994,1],"start_time":"Thu, 08 May 2014 21:06:36 GMT","end_time":"Thu, 08 May 2014 22:17:40 GMT","failover_uuid":0,"failover_seq":0,"seqno":0,"upr_snapshot_seqno":0,"total_docs_checked":0,"total_docs_written":0,"total_data_replicated":0}

Next record on updation of the same item -

{"commitopaque":[158975596682994,2],"start_time":"Thu, 08 May 2014 21:06:36 GMT","end_time":"Thu, 08 May 2014 22:51:37 GMT","failover_uuid":137909158430775,"failover_seq":0,"seqno":1,"upr_snapshot_seqno":1,"total_docs_checked":1,"total_docs_written":1,"total_data_replicated":10}

Observations
--------------------
1. The failover_uuid for first checkpoint record is always 0. Should this not point to local vb_uuid? However subsequent checkpoint records contain expected value.

2. Commitopaque shows that high seqno on remote end is 1 i.e, both source and dest acknowledge that 1 mutation has been replicated. Yet we see :"total_docs_checked":0,"total_docs_written":0,"total_data_replicated":0. These values are always off by one mutation. Why is this so? For eg:

{"commitopaque":[158975596682994,6],"start_time":"Thu, 08 May 2014 21:06:36 GMT","end_time":"Thu, 08 May 2014 22:56:25 GMT","failover_uuid":137909158430775,"failover_seq":0,"seqno":5,"upr_snapshot_seqno":5,"total_docs_checked":5,"total_docs_written":5,"total_data_replicated":50}

3. Why is "seqno":0? what exactly should this point to?

Logs (can possibly explain what we are seeing above?)
--------

xdcr.1-[xdcr:info,2014-05-08T15:17:40.578,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep:start_replication:937]Replication `<<"670a9d8fe1d2c38e369630abeb147862/default/default">>` is using:
xdcr.1- 4 worker processes
xdcr.1- a worker batch size of 500
xdcr.1- a worker batch size (KiB) 2048
xdcr.1- 20 HTTP connections
xdcr.1- a connection timeout of 180 seconds
xdcr.1- 2 retries per request
xdcr.1- socket options are: [

{keepalive,true},{nodelay,false}]
xdcr.1:[xdcr:info,2014-05-08T15:17:40.578,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep_ckpt:do_checkpoint_new:90]checkpointing for vb: 449 at 0 <==== Why are we checkpointing when there are no mutations?

xdcr.1-[xdcr:debug,2014-05-08T15:51:37.472,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep:handle_info:193]get start-replication token for vb 449 from throttle (pid: <0.8436.0>)
xdcr.1-[xdcr:info,2014-05-08T15:51:37.514,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep:start_replication:937]Replication `<<"670a9d8fe1d2c38e369630abeb147862/default/default">>` is using:
xdcr.1- 4 worker processes
xdcr.1- a worker batch size of 500
xdcr.1- a worker batch size (KiB) 2048
xdcr.1- 20 HTTP connections
xdcr.1- a connection timeout of 180 seconds
xdcr.1- 2 retries per request
xdcr.1- socket options are: [{keepalive,true}

{nodelay,false}

]
xdcr.1- source start sequence 1
xdcr.1:[xdcr:info,2014-05-08T15:51:37.514,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep_ckpt:do_checkpoint_new:90]checkpointing for vb: 449 at 1 <==== However here we checkpoint after we replicate
–
xdcr.1-[xdcr:debug,2014-05-08T15:54:07.524,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep:handle_info:193]get start-replication token for vb 449 from throttle (pid: <0.8436.0>)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

10.3.4.186-582014-1757-diag.zip
1.59 MB
08/May/14 6:04 PM
10.3.4.188-582014-180-diag.zip
1.21 MB
08/May/14 6:04 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Aleksey Kondratenko (Inactive)

Reporter:: Aruna Piravi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/May/14 4:05 PM

Updated:: 05/Jun/14 7:26 PM

Resolved:: 05/Jun/14 7:23 PM

Gerrit Reviews

There are no open Gerrit changes

XDCR checkpointing : Errors in checkpoint record

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty