Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11075

XDCR checkpointing : Errors in checkpoint record

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 3.0
    • 3.0
    • XDCR
    • Security Level: Public
    • None
    • Untriaged
    • No

    Description

      Scenario
      -------------

      • Uni-xdcr between 2 one node clusters. Load an item onto vb449. 1 checkpoint is recorded as is as follows -
      {"commitopaque":[158975596682994,1],"start_time":"Thu, 08 May 2014 21:06:36 GMT","end_time":"Thu, 08 May 2014 22:17:40 GMT","failover_uuid":0,"failover_seq":0,"seqno":0,"upr_snapshot_seqno":0,"total_docs_checked":0,"total_docs_written":0,"total_data_replicated":0}

      Next record on updation of the same item -

      {"commitopaque":[158975596682994,2],"start_time":"Thu, 08 May 2014 21:06:36 GMT","end_time":"Thu, 08 May 2014 22:51:37 GMT","failover_uuid":137909158430775,"failover_seq":0,"seqno":1,"upr_snapshot_seqno":1,"total_docs_checked":1,"total_docs_written":1,"total_data_replicated":10}

      Observations
      --------------------
      1. The failover_uuid for first checkpoint record is always 0. Should this not point to local vb_uuid? However subsequent checkpoint records contain expected value.

      2. Commitopaque shows that high seqno on remote end is 1 i.e, both source and dest acknowledge that 1 mutation has been replicated. Yet we see :"total_docs_checked":0,"total_docs_written":0,"total_data_replicated":0. These values are always off by one mutation. Why is this so? For eg:

      {"commitopaque":[158975596682994,6],"start_time":"Thu, 08 May 2014 21:06:36 GMT","end_time":"Thu, 08 May 2014 22:56:25 GMT","failover_uuid":137909158430775,"failover_seq":0,"seqno":5,"upr_snapshot_seqno":5,"total_docs_checked":5,"total_docs_written":5,"total_data_replicated":50}

      3. Why is "seqno":0? what exactly should this point to?

      Logs (can possibly explain what we are seeing above?)
      --------

      xdcr.1-[xdcr:info,2014-05-08T15:17:40.578,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep:start_replication:937]Replication `<<"670a9d8fe1d2c38e369630abeb147862/default/default">>` is using:
      xdcr.1- 4 worker processes
      xdcr.1- a worker batch size of 500
      xdcr.1- a worker batch size (KiB) 2048
      xdcr.1- 20 HTTP connections
      xdcr.1- a connection timeout of 180 seconds
      xdcr.1- 2 retries per request
      xdcr.1- socket options are: [

      {keepalive,true},{nodelay,false}]
      xdcr.1:[xdcr:info,2014-05-08T15:17:40.578,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep_ckpt:do_checkpoint_new:90]checkpointing for vb: 449 at 0 <==== Why are we checkpointing when there are no mutations?



      xdcr.1-[xdcr:debug,2014-05-08T15:51:37.472,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep:handle_info:193]get start-replication token for vb 449 from throttle (pid: <0.8436.0>)
      xdcr.1-[xdcr:info,2014-05-08T15:51:37.514,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep:start_replication:937]Replication `<<"670a9d8fe1d2c38e369630abeb147862/default/default">>` is using:
      xdcr.1- 4 worker processes
      xdcr.1- a worker batch size of 500
      xdcr.1- a worker batch size (KiB) 2048
      xdcr.1- 20 HTTP connections
      xdcr.1- a connection timeout of 180 seconds
      xdcr.1- 2 retries per request
      xdcr.1- socket options are: [{keepalive,true}

      ,

      {nodelay,false}

      ]
      xdcr.1- source start sequence 1
      xdcr.1:[xdcr:info,2014-05-08T15:51:37.514,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep_ckpt:do_checkpoint_new:90]checkpointing for vb: 449 at 1 <==== However here we checkpoint after we replicate

      xdcr.1-[xdcr:debug,2014-05-08T15:54:07.524,ns_1@127.0.0.1:<0.9272.0>:xdc_vbucket_rep:handle_info:193]get start-replication token for vb 449 from throttle (pid: <0.8436.0>)

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            alkondratenko Aleksey Kondratenko (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty