Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-12129

XDCR : replication broken on build 3.0.0-1206

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • 3.0
    • 3.0
    • couchbase-bucket, XDCR
    • Security Level: Public
    • None
    • centOS 6.x build 1206
    • Untriaged
    • Yes

    Description

      Is a regression from 1205 where xdcr worked fine.

      Steps
      --------
      1. Create buckets on .44(4 nodes) and .54(4 nodes) clusters
      2. Load till ~50 dgm on both sides
      3. Set up xdcr.
      standardbucket1(.44) ---> standardbucket1(.54)
      standardbucket(.44) <---->standardbucket(.54)
      4. 50% gets and 50% deletes on both sides for 15 mins
      5. Rebalance-out one node on .44
      6. Rebalance -in same node on .44
      7. Failover and remove same node on .44 (we failover by killing beam and erlang so there is warmup involved)
      8. Failover and addback same node on .44
      9. Rebalance-out one node on .54
      10. Rebalance -in same node on .54
      11. Failover and remove same node on .54
      12. Failover and addback same node on .54
      13. Soft restart all 3 nodes in .44
      14. Soft restart all 3 nodes in .54

      At the end of test, no keys were found in standardbucket1(uni-xdcr with no load on destination) and keys did not match for the bi-xdcr buckets. I think replication never happened, these were initially loaded keys.

      Cross-checked couch files to see if this is a stats issue (indeed found no docs for standardbucket1 on .54)-
      [root@guinep-s10501 standardbucket1]# /opt/couchbase/bin/couch_dbdump *.couch.1
      Dumping "0.couch.1":
      Dumping "100.couch.1":
      Dumping "101.couch.1":
      :
      Dumping "99.couch.1":
      Dumping "9.couch.1":
      Dumping "master.couch.1":

      Total docs: 0

      Some investigation
      --------------------------
      Could be a regression from MB-12100.

      Seeing "startReplication" messages like

      "batchSizeItems":500,"numWorkers":4,"seq":23411,"snapshotStart":23411,"snapshotEnd":23411"
      pls note seq, snapshotStart and snapshotEnd are same, as recorded in xdcr_trace for all startReplication events. I'm not completely sure if that's the root cause for issue but for initial xdcr, seq, snapshotStart and snapshotEnd being same for a vbucket looks weird.

      [root@soursop-s11201 logs]# grep "startReplication" xdcr_trace.log

      {"pid":"<0.5674.1>","type":"startReplication","ts":1409849619.748034,"batchSizeItems":500,"numWorkers":4,"seq":23411,"snapshotStart":23411,"snapshotEnd":23411,"failoverUUUID":130175293122263,"supportsDatatype":false,"changesReader":"<0.25996.101>","changesQueue":"<0.24382.101>","changesManager":"<0.25839.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.25913.101>","<0.21455.101>","<0.25712.101>","<0.22686.101>"],"loc":"xdc_vbucket_rep:start_replication:849"} {"pid":"<0.5613.1>","type":"startReplication","ts":1409849619.758986,"batchSizeItems":500,"numWorkers":4,"seq":23416,"snapshotStart":23416,"snapshotEnd":23416,"failoverUUUID":104945028016240,"supportsDatatype":false,"changesReader":"<0.25799.101>","changesQueue":"<0.25842.101>","changesManager":"<0.25908.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.25104.101>","<0.22805.101>","<0.25126.101>","<0.25806.101>"],"loc":"xdc_vbucket_rep:start_replication:849"} {"pid":"<0.5631.1>","type":"startReplication","ts":1409849619.759695,"batchSizeItems":500,"numWorkers":4,"seq":23205,"snapshotStart":23205,"snapshotEnd":23205,"failoverUUUID":264920799474449,"supportsDatatype":false,"changesReader":"<0.26039.101>","changesQueue":"<0.25320.101>","changesManager":"<0.25997.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.25573.101>","<0.25707.101>","<0.24003.101>","<0.24432.101>"],"loc":"xdc_vbucket_rep:start_replication:849"} {"pid":"<0.5674.1>","type":"startReplication","ts":1409849619.760088,"batchSizeItems":500,"numWorkers":4,"seq":23423,"snapshotStart":23423,"snapshotEnd":23423,"failoverUUUID":130175293122263,"supportsDatatype":false,"changesReader":"<0.26047.101>","changesQueue":"<0.25221.101>","changesManager":"<0.25790.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.25803.101>","<0.15404.101>","<0.25974.101>","<0.15339.101>"],"loc":"xdc_vbucket_rep:start_replication:849"} {"pid":"<0.5749.1>","type":"startReplication","ts":1409849619.761978,"batchSizeItems":500,"numWorkers":4,"seq":22971,"snapshotStart":22971,"snapshotEnd":22971,"failoverUUUID":134507485417479,"supportsDatatype":false,"changesReader":"<0.25943.101>","changesQueue":"<0.26046.101>","changesManager":"<0.25674.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.25792.101>","<0.25894.101>","<0.25977.101>","<0.25991.101>"],"loc":"xdc_vbucket_rep:start_replication:849"} {"pid":"<0.15915.85>","type":"startReplication","ts":1409849619.762293,"batchSizeItems":500,"numWorkers":4,"seq":23612,"snapshotStart":23612,"snapshotEnd":23612,"failoverUUUID":54183205161167,"supportsDatatype":false,"changesReader":"<0.26030.101>","changesQueue":"<0.24274.101>","changesManager":"<0.25898.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.9700.99>","<0.16284.101>","<0.15155.101>","<0.11809.101>"],"loc":"xdc_vbucket_rep:start_replication:849"} {"pid":"<0.5719.1>","type":"startReplication","ts":1409849619.762483,"batchSizeItems":500,"numWorkers":4,"seq":22938,"snapshotStart":22938,"snapshotEnd":22938,"failoverUUUID":184213973604977,"supportsDatatype":false,"changesReader":"<0.15419.101>","changesQueue":"<0.25851.101>","changesManager":"<0.20302.101>","maxConns":20,"optRepThreshold":256,"workers":["<0.26474.99>","<0.20073.101>","<0.20391.101>","<0.25980.101>"],"loc":"xdc_vbucket_rep:start_replication:849"}

      :
      :

      Collecting and attaching logs. Meanwhile if you want to have a look at the cluster - http://172.23.105.44:8091/index.html#sec=buckets

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty