Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6586

Replication rate may drop when the XDCR replication queue size becomes less than 500k items

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0-beta-2
    • Component/s: XDCR
    • Security Level: Public

      Description

      The replicator has a logic that if there are less than 500 items int the queue it will wait a bit (which maybe is too much) until more items shows up.
      This is why the replication rate when the queue is about 600K and less is dropping.

      It seems that when there are 1M items in the queue and more the replication rate is still 5-10K even if there is load on the destination.

      We need to find some solution for that because that logic means that the closer the destination gets to the source, the lower the replication rate is (because there are less items on the XDCR replication queue per vbucket).

      At least the good news is that the front load on the source does not have major impact on the destination.

      Damien is looking at the code in question: I don't see the where the batch size would cause this slowdown. I believe this is a problem in ep-engine where it gets into a state where waking the flusher doesn't work for some reason, so it it must wait for it wake itself.

      I'm going to add some code to see how long we wait for full commits to happen, vs how long we spend doing the other replication work.

        Activity

        peter peter created issue -
        peter peter made changes -
        Field Original Value New Value
        Assignee Junyi Xie [ junyi ] Damien Katz [ damien ]
        peter peter made changes -
        Priority Major [ 3 ] Blocker [ 1 ]
        farshid Farshid Ghods (Inactive) made changes -
        Fix Version/s 2.0-beta [ 10113 ]
        peter peter made changes -
        Fix Version/s 2.0-beta [ 10113 ]
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ns-server-2-0 #465 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/465/)
        MB-6586: concurent throttle state clean up (Revision 791b42f27bea97db653a894f6cda0369da981851)

        Result = SUCCESS
        pwansch :
        Files :

        • src/concurrency_throttle.erl
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ns-server-2-0 #465 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/465/ ) MB-6586 : concurent throttle state clean up (Revision 791b42f27bea97db653a894f6cda0369da981851) Result = SUCCESS pwansch : Files : src/concurrency_throttle.erl
        peter peter made changes -
        Fix Version/s 2.0-beta-refresh [ 10385 ]
        Fix Version/s 2.0 [ 10114 ]
        Priority Blocker [ 1 ] Critical [ 2 ]
        peter peter made changes -
        Summary XDCR ops/sec is low or at 0 for long period of times on destination Replication rate is dropping when the queue size becomes small
        farshid Farshid Ghods (Inactive) made changes -
        Summary Replication rate is dropping when the queue size becomes small Replication rate is dropping when the queue size becomes less than 500 items
        farshid Farshid Ghods (Inactive) made changes -
        Labels 2.0-beta-release-notes
        dipti Dipti Borkar made changes -
        Summary Replication rate is dropping when the queue size becomes less than 500 items Replication rate may drop when the XDCR replication queue size becomes less than 500 items
        junyi Junyi Xie (Inactive) made changes -
        Assignee Damien Katz [ damien ] Junyi Xie [ junyi ]
        Hide
        pavelpaulau Pavel Paulau added a comment -

        The issue description confused me. 500 items or 500K items?

        Show
        pavelpaulau Pavel Paulau added a comment - The issue description confused me. 500 items or 500K items?
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        When # of items in the queue is small, the checkpointting time dominate the replication time (see screeshot). What we can do is probably to remove

        1. increase the checkpoint interval and make it configurable
        2. remove the last_checkpoint at the end of each turn in replicator.

        Show
        junyi Junyi Xie (Inactive) added a comment - When # of items in the queue is small, the checkpointting time dominate the replication time (see screeshot). What we can do is probably to remove 1. increase the checkpoint interval and make it configurable 2. remove the last_checkpoint at the end of each turn in replicator.
        junyi Junyi Xie (Inactive) made changes -
        Summary Replication rate may drop when the XDCR replication queue size becomes less than 500 items Replication rate may drop when the XDCR replication queue size becomes less than 500k items
        Attachment Screen Shot 2012-09-19 at 12.09.13 PM.png [ 15108 ]
        Show
        junyi Junyi Xie (Inactive) added a comment - Commits on gerrit http://review.couchbase.org/#/c/20972/ http://review.couchbase.org/#/c/20967/
        peter peter made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        peter peter made changes -
        Reporter Peter Wansch [ peter ] Ketaki Gangal [ ketaki ]
        farshid Farshid Ghods (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            junyi Junyi Xie (Inactive)
            Reporter:
            ketaki Ketaki Gangal
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes