Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.1.0
    • Component/s: None
    • Security Level: Public
    • Labels:

      Description

      Hi,

      I'm ready to add this section to the Dev Guide 2.1.0 after Observe. Do let me know when we have Ruby and Java finalized and I will add them as examples.

      Thanks

      Karen

      From: Matt Ingenthron <matt@couchbase.com>
      Date: Monday, June 24, 2013 11:36 PM
      To: Karen Zeller <karen.zeller@couchbase.com>
      Subject: replica read docs for dev guide

      Hi Karen,

      The wiki has been updated with some details about replica read which you may be able to start from.
      http://www.couchbase.com/wiki/display/couchbase/Replica+Read

      Also, on C:
      https://github.com/couchbase/libcouchbase/blob/master/man/man3couchbase/lcb_get_replica.3couchbase.txt

      The ruby implementation including rdoc will be up soon.

      The Java implementation is currently in code review and the javadoc there may help as well:
      http://review.couchbase.org/#/c/24750/

      Thanks,

      Matt


      Matt Ingenthron - Director, Developer Solutions
      Couchbase, Inc.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        daschl Michael Nitschinger added a comment -

        Hi Karen,

        ad 1) In the current incarnation, we use a "Fan out" approach and pick the first one that comes back. I guess this is the ":first" approach in ruby. In future releases, we might extend support to different strategies, but this has not been decided upon.

        ad 2) yes, there is - as with all the other operations - a asyncGetFromReplica method that returns a future and can be inspected like a regular asyncGet future response.

        Show
        daschl Michael Nitschinger added a comment - Hi Karen, ad 1) In the current incarnation, we use a "Fan out" approach and pick the first one that comes back. I guess this is the ":first" approach in ruby. In future releases, we might extend support to different strategies, but this has not been decided upon. ad 2) yes, there is - as with all the other operations - a asyncGetFromReplica method that returns a future and can be inspected like a regular asyncGet future response.
        Hide
        kzeller kzeller added a comment - - edited

        Review Input: Sergey:

        Hi Karen.Good work, but I have some comments:

        [FIXED]

        1. could you please make sure that code snippets are well formatted
        (i'm talking about indentation, e.g. at page 60)

        [FIXED] 2. page 61

        The advantage of doing sequential read is that your client
        only sends a single get request over the network and will
        only need to store one response in memory. The disadvantage
        is that during rebalance, replicated data can move to
        another node, which means your client then has to reload
        cluster topology if it needs to reattempt the replica read.

        I'd rather replace "sends a single get request over the network" with
        "doing single API call and let the library to handle failures".
        Because what effectively libcouchbase (and therefore derived
        libraries) is doing can be described with the following pseudocode:

        N = get_number_of_replicas_in_the_cluster()
        value = NULL
        for idx from 0 to N-1 do
        ret = get_replica(key, idx)
        if ret == OK
        value = get_value(ret)
        break;
        else if ret == NOT_MY_VBUCKET
        /* configuration has been changed */
        idx = 0
        end
        end
        /* will return OK and value or last error and NULL */
        return(ret, value)

        Then code above is demostrative sequential reading, and it is clear
        that it is most handy from the user's application perspective and also
        the most reliable approach. But the main disadvantage here is it can
        involve a lot of calls to the server: consider following log for
        cluster with 3 replicas:

        get_replica("foo", 0) --> NOT_FOUND
        get_replica("foo", 1) --> NOT_FOUND
        /* rebalance occured and replaca had been moved */
        get_replica("foo", 2) --> NOT_MY_VBUCKET
        get_replica("foo", 0) --> OK

        You can see here, that even we have N = 3, there might be the case
        when library need to issue more requests to make sure that it tried
        all replicas.

        [FIXED. Changed to " Because this approach takes the first instance of replicated data it finds on a node, it may not be the most current version in the cluster. "]3. page 61

        Also, because this approach takes the first instance of
        replicated data it finds on a node, it can also mean that
        this instance is the most current instance in the cluster.

        I don't think that we should recommend people to rely on the fact that
        the cluster is doing replication sequentially, because in future it
        could be parallelized, and you cannot say that first replica is the
        most recent (when third hasn't been updated yet.

        4. page 61

        [FIXED] Changed to " The advantage of this approach is you can control the number of replica reads with this method. For example if you know there are three nodes with replica data you can only ask the first two and do so in parallel from your client. The disadvantage is that your code needs to check the return codes from each node and handle them. "

        The advantage of this approach is that you can get a
        specific instance of the replicated data by specifying the
        node. Like a sequential replica read, your client only
        sends a single request and will only need to store a single
        response in memory. The disadvantage is also the same as
        sequential replica read; if the replicated item moves to
        another node during rebalance, your client must get the
        cluster topology again.

        The real advantage here is that the user can control the number of the
        replica requests here, for example, he knows that there are three
        replicas in cluster, but can ask only first two, which could be also
        pipelined with SDK means. But disadvantage, that we must check the
        return codes, and handle them. Among three strategies SELECT strategy
        is the most basic, and could be used to implement all others (like
        FIRST and ALL), and also quite more, like "only one or two replicas".
        This strategy is about controlling latency, when you are handling
        exceptional situation when master node isn't able to serve your query.

        5. page 62

        [FIXED] replaced with "and you only need to perform a single API call for this request"

        The requests are all made as a single network roundtrip, and
        may require less round-trips than if you iterate through all
        possible nodes.

        This isn't exactly true, because the replica read requests are
        scattered over the cluster to several nodes, but those packets will be
        sent independently of each other. Again when caller is using ALL
        strategy, he is controlling latency, because he like saying "OK i know
        it isn't as safe and secure as FIRST strategy, but I need to just pull
        all replicas, and I will decide which one to use".


        Sergey Avseyev

        Show
        kzeller kzeller added a comment - - edited Review Input: Sergey: Hi Karen.Good work, but I have some comments: [FIXED] 1. could you please make sure that code snippets are well formatted (i'm talking about indentation, e.g. at page 60) [FIXED] 2. page 61 The advantage of doing sequential read is that your client only sends a single get request over the network and will only need to store one response in memory. The disadvantage is that during rebalance, replicated data can move to another node, which means your client then has to reload cluster topology if it needs to reattempt the replica read. I'd rather replace "sends a single get request over the network" with "doing single API call and let the library to handle failures". Because what effectively libcouchbase (and therefore derived libraries) is doing can be described with the following pseudocode: N = get_number_of_replicas_in_the_cluster() value = NULL for idx from 0 to N-1 do ret = get_replica(key, idx) if ret == OK value = get_value(ret) break; else if ret == NOT_MY_VBUCKET /* configuration has been changed */ idx = 0 end end /* will return OK and value or last error and NULL */ return(ret, value) Then code above is demostrative sequential reading, and it is clear that it is most handy from the user's application perspective and also the most reliable approach. But the main disadvantage here is it can involve a lot of calls to the server: consider following log for cluster with 3 replicas: get_replica("foo", 0) --> NOT_FOUND get_replica("foo", 1) --> NOT_FOUND /* rebalance occured and replaca had been moved */ get_replica("foo", 2) --> NOT_MY_VBUCKET get_replica("foo", 0) --> OK You can see here, that even we have N = 3, there might be the case when library need to issue more requests to make sure that it tried all replicas. [FIXED. Changed to " Because this approach takes the first instance of replicated data it finds on a node, it may not be the most current version in the cluster. "] 3. page 61 Also, because this approach takes the first instance of replicated data it finds on a node, it can also mean that this instance is the most current instance in the cluster. I don't think that we should recommend people to rely on the fact that the cluster is doing replication sequentially, because in future it could be parallelized, and you cannot say that first replica is the most recent (when third hasn't been updated yet. 4. page 61 [FIXED] Changed to " The advantage of this approach is you can control the number of replica reads with this method. For example if you know there are three nodes with replica data you can only ask the first two and do so in parallel from your client. The disadvantage is that your code needs to check the return codes from each node and handle them. " The advantage of this approach is that you can get a specific instance of the replicated data by specifying the node. Like a sequential replica read, your client only sends a single request and will only need to store a single response in memory. The disadvantage is also the same as sequential replica read; if the replicated item moves to another node during rebalance, your client must get the cluster topology again. The real advantage here is that the user can control the number of the replica requests here, for example, he knows that there are three replicas in cluster, but can ask only first two, which could be also pipelined with SDK means. But disadvantage, that we must check the return codes, and handle them. Among three strategies SELECT strategy is the most basic, and could be used to implement all others (like FIRST and ALL), and also quite more, like "only one or two replicas". This strategy is about controlling latency, when you are handling exceptional situation when master node isn't able to serve your query. 5. page 62 [FIXED] replaced with "and you only need to perform a single API call for this request" The requests are all made as a single network roundtrip, and may require less round-trips than if you iterate through all possible nodes. This isn't exactly true, because the replica read requests are scattered over the cluster to several nodes, but those packets will be sent independently of each other. Again when caller is using ALL strategy, he is controlling latency, because he like saying "OK i know it isn't as safe and secure as FIRST strategy, but I need to just pull all replicas, and I will decide which one to use". – Sergey Avseyev
        Hide
        anil Anil Kumar added a comment -

        changing back the priority to High-Priority.

        Show
        anil Anil Kumar added a comment - changing back the priority to High-Priority.
        Hide
        kzeller kzeller added a comment -

        Reviews due date past. Incorporated Sergey input.

        Show
        kzeller kzeller added a comment - Reviews due date past. Incorporated Sergey input.
        Hide
        kzeller kzeller added a comment -

        Reviews due date past. Incorporated Sergey input.

        Show
        kzeller kzeller added a comment - Reviews due date past. Incorporated Sergey input.

          People

          • Assignee:
            kzeller kzeller
            Reporter:
            kzeller kzeller
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes