Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-10792

checkpoint commit failure at start of replication

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Critical
    • 3.0
    • 3.0
    • XDCR
    • Security Level: Public
    • None
    • build 547
    • Untriaged
    • Centos 32-bit
    • Unknown

    Description

      http://qa.hq.northscale.net/job/ubuntu_x64--37_02--biXDCR-P1/15/consoleFull -> Test case 9.

      ./testrunner -i /tmp/ubuntu-64-2.0-biXDCR-all.ini get-cbcollect-info=True -t xdcr.biXDCR.bidirectional.load_with_failover,replicas=1,items=10000,ctopology=chain,rdirection=bidirection,standard_buckets=1,expires=60,doc-ops=create-update-delete,doc-ops-dest=create-update,failover=destination,replication_type=xmem,GROUP=P0;xmem

      Test is failed with number of items mismatch on the server:
      [2014-04-07 05:51:08,257] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
      [2014-04-07 05:51:13,291] - [task:420] WARNING - Not Ready: curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
      [2014-04-07 05:51:13,306] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
      [2014-04-07 05:51:18,356] - [task:420] WARNING - Not Ready: curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
      [2014-04-07 05:51:18,391] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
      [2014-04-07 05:51:23,439] - [task:420] WARNING - Not Ready: curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
      [2014-04-07 05:51:23,498] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
      [2014-04-07 05:51:28,556] - [task:420] WARNING - Not Ready: curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
      [2014-04-07 05:51:28,642] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
      [2014-04-07 05:51:33,686] - [task:420] WARNING - Not Ready: curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket
      [2014-04-07 05:51:33,730] - [task:420] WARNING - Not Ready: vb_active_curr_items 10997 == 11000 expected on '10.3.121.56:8091''10.3.121.57:8091''10.3.4.244:8091', default bucket

      [Test Steps]
      1. Create 3-3 nodes SRC and DEST clusters.
      2. Setup bi-directional xmem TAP based XDCR on default and standard_bucket_1. Checkpoint interval is set to 120 seconds.
      3. Load 10000 items on both the buckets at both SRC and DEST cluster.
      4. Perform failover/rebalance-out at destination side for one node.
      5. Perform 30% update and delete at SRC side.During update, set expiration time to 60 seconds.
      6. Perform 30% update at destination side. During update, set expiration time to 60 seconds.
      7. Expected 11000 items each side.

      Test is failed on Step-7, 10997 items are there on Source cluster (10.3.121.56) on default bucket.

      I can see there lot of error in xdcr on Cluster 10.3.121.59:

      [xdcr:error,2014-04-07T5:49:53.853,ns_1@10.3.121.59:<0.27070.18>:xdc_vbucket_rep:start_replication:1000]checkpoint commit failure at start of replication for vb 813
      [xdcr:error,2014-04-07T5:49:53.854,ns_1@10.3.121.59:<0.27070.18>:xdc_vbucket_rep:terminate:534]Replication (XMem mode) `3f3e8f7fe887b7288e0e31ee0098cc72/default/default` (`default/813` -> `http://*****@10.3.4.244:8092/default%2f813%3bf806f153aba876ebc86ca21ceaceb8ce`) failed.Please see ns_server debug log for complete state dump
      [xdcr:error,2014-04-07T5:49:54.655,ns_1@10.3.121.59:<0.26952.18>:xdc_vbucket_rep_ckpt:do_checkpoint_old:220]Checkpointing failed unexpectedly (or could be network problem):

      {local_vbuuid_mismatch, <<"189581637071222">>, <<"83846604416697">>}

      [xdcr:error,2014-04-07T5:49:54.661,ns_1@10.3.121.59:<0.26952.18>:xdc_vbucket_rep:start_replication:1000]checkpoint commit failure at start of replication for vb 833
      [xdcr:error,2014-04-07T5:49:54.661,ns_1@10.3.121.59:<0.26952.18>:xdc_vbucket_rep:terminate:534]Replication (XMem mode) `3f3e8f7fe887b7288e0e31ee0098cc72/default/default` (`default/833` -> `http://*****@10.3.4.244:8092/default%2f833%3bf806f153aba876ebc86ca21ceaceb8ce`) failed.Please see ns_server debug log for complete state dump
      [xdcr:error,2014-04-07T5:49:55.040,ns_1@10.3.121.59:<0.26991.18>:xdc_vbucket_rep_ckpt:do_checkpoint_old:220]Checkpointing failed unexpectedly (or could be network problem):

      {local_vbuuid_mismatch, <<"31693391851147">>, <<"272915716248749">>}

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sangharsh Sangharsh Agarwal
            sangharsh Sangharsh Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty