Uploaded image for project: 'Couchbase Java Client'
  1. Couchbase Java Client
  2. JCBC-114

Command Futures never receive results after rebalance-out (or other sorts of topology/network changes)

    Details

    • Type: Bug
    • Status: Reopened
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0.3
    • Fix Version/s: .backlog1.x
    • Component/s: Documentation
    • Security Level: Public
    • Labels:
      None

      Issue Links

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        mnunberg Mark Nunberg added a comment -

        This is a real blocker, and seems to be related to a few vbuckets. This issue is preventing me from properly measuring command durations

        Show
        mnunberg Mark Nunberg added a comment - This is a real blocker, and seems to be related to a few vbuckets. This issue is preventing me from properly measuring command durations
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Matt/Rags,

        This issue is a blocker for executing more integration tests on java sdk. are there workarounds to avoid this use case or a fix on the way ?
        Please assign this back to Mark if more information or logs needed for this issue

        Show
        farshid Farshid Ghods (Inactive) added a comment - Matt/Rags, This issue is a blocker for executing more integration tests on java sdk. are there workarounds to avoid this use case or a fix on the way ? Please assign this back to Mark if more information or logs needed for this issue
        Hide
        ingenthr Matt Ingenthron added a comment -

        Please have a look at this.

        Show
        ingenthr Matt Ingenthron added a comment - Please have a look at this.
        Hide
        mnunberg Mark Nunberg added a comment -

        Michael,

        I would not try this test manually.. the use case in more detail is as follows:

        • Single CouchbaseClient object
        • 20 user threads. 10 setting and 10 getting the same sorts of kv
        • Operations are done asynchronously. They are submitted into a queue which is then checked periodically for isDone/isCancelled.
        • 4 node cluster. Nodes are removed, connections are broken

        The issue is those polling methods never returning true, unless they are retrieved synchronously (i.e. ft.get()).. which is actually an accidental detail

        Show
        mnunberg Mark Nunberg added a comment - Michael, I would not try this test manually.. the use case in more detail is as follows: Single CouchbaseClient object 20 user threads. 10 setting and 10 getting the same sorts of kv Operations are done asynchronously. They are submitted into a queue which is then checked periodically for isDone/isCancelled. 4 node cluster. Nodes are removed, connections are broken The issue is those polling methods never returning true, unless they are retrieved synchronously (i.e. ft.get()).. which is actually an accidental detail
        Hide
        ingenthr Matt Ingenthron added a comment -

        We looked at this pretty closely today. The issue here is that the client as designed relies on the get() from the caller to trigger the timeout. An operation will, somewhat correctly, never transition to isDone() or isCancelled() unless someone cares to use it.

        The scenario that was likely in play over the WAN here is that the request was in flight to the server while the config was in flight down to the client. It arrives at the server, but is never responded to. Since the get() is never called, it'll never time out and transition to the canceled state.

        We recommend you change the test code to use the queue more like a queue and just get() each one. Iterating through the queue is a bit funny in the first place, but if using the get() on the Future objects, you'll still have asynchronous behavior and much of the time the get() will be returning since the data is already there.

        Show
        ingenthr Matt Ingenthron added a comment - We looked at this pretty closely today. The issue here is that the client as designed relies on the get() from the caller to trigger the timeout. An operation will, somewhat correctly, never transition to isDone() or isCancelled() unless someone cares to use it. The scenario that was likely in play over the WAN here is that the request was in flight to the server while the config was in flight down to the client. It arrives at the server, but is never responded to. Since the get() is never called, it'll never time out and transition to the canceled state. We recommend you change the test code to use the queue more like a queue and just get() each one. Iterating through the queue is a bit funny in the first place, but if using the get() on the Future objects, you'll still have asynchronous behavior and much of the time the get() will be returning since the data is already there.
        Hide
        ingenthr Matt Ingenthron added a comment -

        This behavior should be better documented, both in the javadoc and in the API reference.

        Show
        ingenthr Matt Ingenthron added a comment - This behavior should be better documented, both in the javadoc and in the API reference.

          People

          • Assignee:
            daschl Michael Nitschinger
            Reporter:
            mnunberg Mark Nunberg
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:

              Gerrit Reviews

              There are no open Gerrit changes