Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7415

Faster doc lookups in source cluster for xdcr

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0, 2.0.1
    • Fix Version/s: 2.1.0
    • Component/s: XDCR
    • Security Level: Public
    • Labels:

      Description

      Currently the replicator worker is getting doc metadata from the changes feed (seq tree) and then getting the document body by doing a lookup in the id tree of the vbucket.

      This was inherited from the CouchDB original replicator I wrote, where it couldn't be avoided because revision tree was only accessible in id tree - however for Couchbase we don't have those trees, and the values for both the seq and id trees are the same.

      This change avoids unnecessary btree lookups in the source cluster's vbucket databases.

      I started over an year ago to do it for CouchDB, but it was limited for first version of a doc (where there's no need to gets its rev tree) and didn't touch it anymore (this was before we removed revision trees, changed file formats, etc). Gerrit change:

      http://review.couchbase.org/#/c/11326/

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        At the time, relative improvements for total replication time varied between 30% and 40% for spinning disks on Linux.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - At the time, relative improvements for total replication time varied between 30% and 40% for spinning disks on Linux.
        Hide
        dipti Dipti Borkar added a comment -

        how big is the improvement from a latency perspective? should we target this for 2.0.2?

        Show
        dipti Dipti Borkar added a comment - how big is the improvement from a latency perspective? should we target this for 2.0.2?
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Hard to tell about latency. This was tested within couchdb context over an year ago (i.e., single vbucket).
        Total replication time, for datasets between 10M and 40M, decreased by 30% to 40%.

        But it's clear that not doing btree lookups (disk IO, reads), on a doc by doc basis, is faster than not doing them at all
        We should profit from our (Couchbase) simplified database file format that allows us to avoid these lookups.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Hard to tell about latency. This was tested within couchdb context over an year ago (i.e., single vbucket). Total replication time, for datasets between 10M and 40M, decreased by 30% to 40%. But it's clear that not doing btree lookups (disk IO, reads), on a doc by doc basis, is faster than not doing them at all We should profit from our (Couchbase) simplified database file format that allows us to avoid these lookups.
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        fix merged to master

        http://review.couchbase.org/24392

        Thanks,

        Junyi

        Show
        junyi Junyi Xie (Inactive) added a comment - fix merged to master http://review.couchbase.org/24392 Thanks, Junyi
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        now ns_server only has master branch, no need to back port to 2.0.2

        Show
        junyi Junyi Xie (Inactive) added a comment - now ns_server only has master branch, no need to back port to 2.0.2
        Hide
        maria Maria McDuff (Inactive) added a comment -

        no functional verification required. changes were very low-level (internals) and un-verifiable by QE. QE will be running small-load tests as part of system test regression.

        Show
        maria Maria McDuff (Inactive) added a comment - no functional verification required. changes were very low-level (internals) and un-verifiable by QE. QE will be running small-load tests as part of system test regression.

          People

          • Assignee:
            junyi Junyi Xie (Inactive)
            Reporter:
            FilipeManana Filipe Manana (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes