Uploaded image for project: 'Couchbase C client library libcouchbase'
  1. Couchbase C client library libcouchbase
  2. CCBC-150

Observe (and possibly other) callbacks receiving NULL callback more than once when batched

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.7, 2.1.3, 2.2.0, 2.3.0-dp2
    • Fix Version/s: 2.3.0
    • Component/s: library
    • Security Level: Public
    • Labels:
      None

      Description

      See here http://paste.scsys.co.uk/219902 for an example.

      This primarily affects observe operation in 'multi' mode; where multiple observe requests are queued up. This fails when the request distribution is asymmetric.

      This also affects stats and version commands if coupled with other commands in an asynchronous mode.

      This will definitely affect node.js.

      In summary, the function lcb_lookup_server_with_packet tries to find 'associated' packets; but it does it wrongly. It assumes associated packets will always be at the tail (or pop-end) of the command log.

      However this will only be the case in a purely synchronous mode where there are no subsequent commands. The following scenario demonstrates this rather well:

      With three servers and two keys (and one replica), you have something like this for the vb/server map

      "K1" = servers[1], servers[2]
      "K2" = servers[0], servers[1]

      When the commands are sent out, the command look for the server looks something like:

      – order in the buffer
      BUFFERS:
      [ OLD .. NEW]

      0: "K2",
      1: "K1", "K2"
      2: "K1",

      LOGICAL PRECEDENCE:
      [ OLD .. NEW ]

      "K1", "K2"

      The idea to keep in mind is that even though we've scheduled the request for "K1" first, we may very well receive the response for "K2" first - this is especially true if server[0] is quicker than the other servers.

      So what happens when "K2" is received first? - the code checks to see the older commands on all the other servers (only the oldest command), and if not, it assumes this is the last command to be received and notifies the user that the command has finished.

      So now the buffers look like:

      0: "K3"
      1: "K1", "K2"
      2: "K1"

      Now, server[1] responds with "K1". Since the request for "K1" is still at the pop-end of server[2] nothing special happens here

      The buffers look like:

      0:
      1: "K2"
      2: "K1"

      Then server[1] responds again, this time with the next response for "K2". It delivers the callback again. In a typical use case a user maintains a cookie tied to the command, and when the command is done, the cookie is freed. Plainly speaking, this double-singalling (i.e. telling the user a command has finished twice) segfaults the app.

        Issue Links

        # Subject Project Status CR V
        For Gerrit Dashboard: &For+CCBC-150=message:CCBC-150

          Activity

          Hide
          mnunberg Mark Nunberg added a comment - - edited

          The issue of performance is indeed a corner case (as noted in my initial comment) - however the other two concerns - namely the tip-toeing needed and the whole lack of clarity when reading the code - and understanding how the commands are actually tied together, leads to a point of significant confusion (confusing enough that we're running into these issues now, which would have likely been avoided had the association been explicit). It's my opinion that there are too many implicit assumptions with little performance gain; I think we can agree that our performance (read: throughput) issues with observe will not be CPU-bound but rather I/O bound.

          Anyway my point isn't so much about performance but rather about clarity in respect to what happens in such a central code path. In terms of performance for the normal case, looking in the command log might be "faster" because there's no need to allocate an extra chunk of memory to contain the refcount (assuming that there is only a handful of commands in the pipeline); however I think it's safe to say that observe/version/stats aren't going to be commands bound by client-side CPU or memory.

          We'd also need to handle all these commands in the case of server failures (see patch/issue for CCBC-155) - which wouldn't be impossible; we would need to pass these functions the opaque value for the request/response.

          Edit: I see that VERSION already adds the boilerplate there. In any event, OBSERVE is still special - and in the future should be split out into three commands (on the client) - durability (what I'm working on right now), broadcast (our current lcb_observe implementation) and check (which checks the master to see if the key exists without actually fetching its contents)

          Show
          mnunberg Mark Nunberg added a comment - - edited The issue of performance is indeed a corner case (as noted in my initial comment) - however the other two concerns - namely the tip-toeing needed and the whole lack of clarity when reading the code - and understanding how the commands are actually tied together, leads to a point of significant confusion (confusing enough that we're running into these issues now, which would have likely been avoided had the association been explicit). It's my opinion that there are too many implicit assumptions with little performance gain; I think we can agree that our performance (read: throughput) issues with observe will not be CPU-bound but rather I/O bound. Anyway my point isn't so much about performance but rather about clarity in respect to what happens in such a central code path. In terms of performance for the normal case, looking in the command log might be "faster" because there's no need to allocate an extra chunk of memory to contain the refcount (assuming that there is only a handful of commands in the pipeline); however I think it's safe to say that observe/version/stats aren't going to be commands bound by client-side CPU or memory. We'd also need to handle all these commands in the case of server failures (see patch/issue for CCBC-155 ) - which wouldn't be impossible; we would need to pass these functions the opaque value for the request/response. Edit: I see that VERSION already adds the boilerplate there. In any event, OBSERVE is still special - and in the future should be split out into three commands (on the client) - durability (what I'm working on right now), broadcast (our current lcb_observe implementation) and check (which checks the master to see if the key exists without actually fetching its contents)
          Hide
          trond Trond Norbye added a comment -

          these commands should go in libcouchbase_util or something... It woud be nice with a discussion about such commands prior to implementations....

          Show
          trond Trond Norbye added a comment - these commands should go in libcouchbase_util or something... It woud be nice with a discussion about such commands prior to implementations....
          Hide
          trond Trond Norbye added a comment -

          The fix is already merged...

          Show
          trond Trond Norbye added a comment - The fix is already merged...
          Hide
          mnunberg Mark Nunberg added a comment -

          Re-Open for re-close

          Show
          mnunberg Mark Nunberg added a comment - Re-Open for re-close
          Show
          mnunberg Mark Nunberg added a comment - http://review.couchbase.org/23526

            People

            • Assignee:
              mnunberg Mark Nunberg
              Reporter:
              mnunberg Mark Nunberg
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes