Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7837

[RN 2.0.2 + Doc] continuously tunable optimistic XDCR

    Details

    • Type: Story
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.1.0
    • Component/s: XDCR
    • Security Level: Public
    • Flagged:
      Release Note

      Description

      Hi Damien,

      What you said makes perfect sense. The reason we use rev_diff is to avoid sending a big doc which eventually fails conflict solution and is discarded at destination. For deletion, there isn't any benefit to do that, in contrast, we need to send (key+meta) twice for any deletion replicated to remote.

      We can introduce a threshold parameter to quantify the body size, that is, for any doc smaller than that threshold, we skip revs_diff and send it directly to remote side, for any doc bigger than that we still send revs_diff first.

      If the parameter is 0, that means, other than deletion (whose body size is 0), we need send revs_diff for all mutations. By this way, deletion is naturally encoded into that parameter, and we do not need to differentiate deletion from other mutations.

      If the parameter is infinity (or substantially big), that is equivalent to "optimistic XDCR", meaning we optimistically send all docs without revs_diff, regardless of its size. We can even retire the parameter "optimistic XDCR" by this way.

      Only two questions,

      1. How to get the doc size? Is it encoded in the doc_info? If not, we may want to do that otherwise we have to pay another lookup to merely get doc size.

      2, What is the reasonable number of that threshold? I will probably start with number like 1-2K.

      Thanks!

      Junyi

      ===================================

      Hey Junyi, I just realized that we can perform a optimization in XDCR to make the replication of deletion records faster and more efficient.

      I might be wrong, but I believe that XDCR encounters a deletion record, it sends it in the _revs_diff call and if the target doesn't have it, we then send over all the same informtation in the bulk docs POST. But this is unnecessarily inefficient.

      If we skipped the _revs_diff call for deletion records, and instead sent it unconditionally on the bulk post, we'd send half as much information and do much less background fetches for unidirectional sand the first half of bidirectional replication, and do the same amount of data sending work for the "bounce back" step of bidirectional replication. This is because the deleted documents should never have a body, so the amount of information sent in the _revs_diff and bulk post is the same.

      If we also unconditionally sent very small bodies for non-deleted documents (say less than 100bytes), we'd also be sending less total information for the unidirectional replication and performing much less work, and only slightly more information on the second part of bidirectional replication with less work and information on the first part. It should also be a nice win for small documents.

      This optimization can be made without sacrificing backwards compatibility or correctness. Anyway, wanted to just get this idea out there to you. Let me know if I didn't do a good job describing this, or you see something wrong in my reasoning. Thanks!

      -Damien


      Damien Katz

      # Subject Project Status CR V
      For Gerrit Dashboard: &For+MB-7837=message:MB-7837

        Activity

        Show
        junyi Junyi Xie (Inactive) added a comment - http://review.couchbase.org/#/c/24934/
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        code merged to 2.0.2

        pending extensive performance and system test

        Show
        junyi Junyi Xie (Inactive) added a comment - code merged to 2.0.2 pending extensive performance and system test
        Show
        junyi Junyi Xie (Inactive) added a comment - http://review.couchbase.org/#/c/24934/
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        The comment below is for documentation purpose of 2.0.2 @Karen Zeller

        In 2.0.2 we will introduce a new parameter, namely, "xdcr_optimistic_replication_threshold", which is a non-negative integer parameter, in unit of bytes. It is 256 bytes by default.

        This paraemter is used by XDCR to split docs list into two: a list of big docs, whose docs are all bigger than the threshold parameter, and a list of small docs
        whose docs are no greater than it. For small docs, we skip all revs_diff operations and optimistically send them directly to the remote cluster. That eliminates the first getMeta operation from source to destination. If the doc from source fails conflict resolution, it would be discarded by destination node. The correctness of XDCR is not affected. This new behavior of replication is to optimize replication latency, especially for small documents, at possible cost of bandwidth waste.

        For big docs whose doc body is bigger than "xdcr_optimistic_replication_threshold", we still keep the current XDCR behavior, that is, for each doc, we first send revs_diff t and then only send those docs surviving conflict resolution at remote node. This behavior is to optimize bandwidth instead of latency, since we never send any doc if it fails conflict resolution at destination, however, it may not latency optimized since for each doc we need to do two sequential operations, metadata operation to get the list of keys that indeed need to be replicated, and then send these docs.

        By this parameter, users are able to continuously to determine which docs should be replicated optimistically. As a result, uses are able to choose between latency optimized or bandwidth optimized in practically. At one extreme if we set this parameter to 0, all docs will be treated as "big docs" and sent to remote conservatively to save bandwidth. At the other extreme when the parameter is set to be a significantly big value, all updates are considered "small docs" and will be sent optimistically to remote side in favor of latency.

        Please note the deletion, however, is always treated as "a small doc" and sent optimistically, regardless of its doc size and the parameter, because there is no benefit to send revs_diff for deletions at all.

        The corresponding environment parameter is:

        "XDCR_OPTIMISTIC_REPLICATION_THRESHOLD"

        and users can always override the ns_server parameter using the env
        parameter.

        Show
        junyi Junyi Xie (Inactive) added a comment - The comment below is for documentation purpose of 2.0.2 @Karen Zeller In 2.0.2 we will introduce a new parameter, namely, "xdcr_optimistic_replication_threshold", which is a non-negative integer parameter, in unit of bytes. It is 256 bytes by default. This paraemter is used by XDCR to split docs list into two: a list of big docs, whose docs are all bigger than the threshold parameter, and a list of small docs whose docs are no greater than it. For small docs, we skip all revs_diff operations and optimistically send them directly to the remote cluster. That eliminates the first getMeta operation from source to destination. If the doc from source fails conflict resolution, it would be discarded by destination node. The correctness of XDCR is not affected. This new behavior of replication is to optimize replication latency, especially for small documents, at possible cost of bandwidth waste. For big docs whose doc body is bigger than "xdcr_optimistic_replication_threshold", we still keep the current XDCR behavior, that is, for each doc, we first send revs_diff t and then only send those docs surviving conflict resolution at remote node. This behavior is to optimize bandwidth instead of latency, since we never send any doc if it fails conflict resolution at destination, however, it may not latency optimized since for each doc we need to do two sequential operations, metadata operation to get the list of keys that indeed need to be replicated, and then send these docs. By this parameter, users are able to continuously to determine which docs should be replicated optimistically. As a result, uses are able to choose between latency optimized or bandwidth optimized in practically. At one extreme if we set this parameter to 0, all docs will be treated as "big docs" and sent to remote conservatively to save bandwidth. At the other extreme when the parameter is set to be a significantly big value, all updates are considered "small docs" and will be sent optimistically to remote side in favor of latency. Please note the deletion, however, is always treated as "a small doc" and sent optimistically, regardless of its doc size and the parameter, because there is no benefit to send revs_diff for deletions at all. The corresponding environment parameter is: "XDCR_OPTIMISTIC_REPLICATION_THRESHOLD" and users can always override the ns_server parameter using the env parameter.
        Hide
        kzeller kzeller added a comment -

        Add to 2.0.2 RN plus XDCR REST content.

        Show
        kzeller kzeller added a comment - Add to 2.0.2 RN plus XDCR REST content.
        Hide
        maria Maria McDuff (Inactive) added a comment -

        Abhinav,

        new feature to test for 2.0.2 — need to create new tests for this new parameter: XDCR_OPTIMISTIC_REPLICATION_THRESHOLD

        Show
        maria Maria McDuff (Inactive) added a comment - Abhinav, new feature to test for 2.0.2 — need to create new tests for this new parameter: XDCR_OPTIMISTIC_REPLICATION_THRESHOLD
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Also, after introducing "xdcr_optimistic_replication_threshold", the old boolean "xdcr_optimistic_replication" parameter has been retired.

        Show
        junyi Junyi Xie (Inactive) added a comment - Also, after introducing "xdcr_optimistic_replication_threshold", the old boolean "xdcr_optimistic_replication" parameter has been retired.
        Hide
        kzeller kzeller added a comment -

        Hi Junyi- will this be available as part of the /internalsettings endpoints for XDCR?

        What are the possible ranges of values that will be allowed?

        Show
        kzeller kzeller added a comment - Hi Junyi- will this be available as part of the /internalsettings endpoints for XDCR? What are the possible ranges of values that will be allowed?
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Karen,

        Yes, ns_server team just merged some code to allow users to set this parameter via /internalsettings.

        The allowed range is non-negative integer, e.g., from 0 to the maximum integer Erlang can support. On 32-bit architectures, it is something around 134M
        On 64-bit architectures it is a huge number 576460752303423488. Our customer should be unlikely to hit the bound.

        Show
        junyi Junyi Xie (Inactive) added a comment - Karen, Yes, ns_server team just merged some code to allow users to set this parameter via /internalsettings. The allowed range is non-negative integer, e.g., from 0 to the maximum integer Erlang can support. On 32-bit architectures, it is something around 134M On 64-bit architectures it is a huge number 576460752303423488. Our customer should be unlikely to hit the bound.

          People

          • Assignee:
            abhinav Abhinav Dangeti
            Reporter:
            junyi Junyi Xie (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes