Uploaded image for project: 'Couchbase Java Client'
  1. Couchbase Java Client
  2. JCBC-368

Deadlock in BucketMonitor.startMonitor() on CountDownLatch when channel creation fails.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.1.9, 1.2
    • Fix Version/s: 1.2.2
    • Component/s: Core
    • Security Level: Public
    • Labels:
      None

      Description

      Deadlock in BucketMonitor.startMonitor() on CountDownLatch when channel creation fails.

      Before deadlocking it also logs following exception (which by itself is another bug):

      java.lang.IllegalStateException: An Executor cannot be shut down from the thread acquired from itself. Please make sure you are not calling releaseExternalResources() from an I/O worker thread.
      at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:73)
      at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:49)
      at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.releaseExternalResources(NioClientSocketChannelFactory.java:180)
      at org.jboss.netty.bootstrap.Bootstrap.releaseExternalResources(Bootstrap.java:319)
      at com.couchbase.client.vbucket.BucketMonitor$1.operationComplete(BucketMonitor.java:193)
      at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:428)
      at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:419)
      at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:381)
      at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processConnectTimeout(NioClientSocketPipelineSink.java:394)
      at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:289)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:724)

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            daschl Michael Nitschinger added a comment -

            I see, thanks for reporting it. Look forward to a proper fix in the next bugfix release.

            Show
            daschl Michael Nitschinger added a comment - I see, thanks for reporting it. Look forward to a proper fix in the next bugfix release.
            Hide
            daschl Michael Nitschinger added a comment -

            I think I fixed it, but in any case something went wrong during connection to the streaming connection.

            Does the error is expected to happen in your environment or was it considered stable? I wonder in which circumstances you saw that. Even with my fix, its just shutting down properly but still the streaming conn attachment won't succeed.

            Show
            daschl Michael Nitschinger added a comment - I think I fixed it, but in any case something went wrong during connection to the streaming connection. Does the error is expected to happen in your environment or was it considered stable? I wonder in which circumstances you saw that. Even with my fix, its just shutting down properly but still the streaming conn attachment won't succeed.
            Hide
            afds Deniss Afonin added a comment -

            Well, this happens for us only when couchbase server is not in local network i.e. we are connecting via internet with ~100ms latency and that internet connection might not be that stable.

            Issue JCBC-326 might also be related to this.

            Show
            afds Deniss Afonin added a comment - Well, this happens for us only when couchbase server is not in local network i.e. we are connecting via internet with ~100ms latency and that internet connection might not be that stable. Issue JCBC-326 might also be related to this.
            Hide
            daschl Michael Nitschinger added a comment -

            Yes that's what I assumed. So the thing is, in general we don't recommend running your app servers and db servers across the internet (they should be in the same datacenter, especially if you care about latency).

            But this shouldn't be a limitation of the client itself. So I have a bugfix for this upcoming, but I'll also see what I can do to add some more retry logic to try different nodes from the list so if one fails we eventually connect to another one.

            Show
            daschl Michael Nitschinger added a comment - Yes that's what I assumed. So the thing is, in general we don't recommend running your app servers and db servers across the internet (they should be in the same datacenter, especially if you care about latency). But this shouldn't be a limitation of the client itself. So I have a bugfix for this upcoming, but I'll also see what I can do to add some more retry logic to try different nodes from the list so if one fails we eventually connect to another one.
            Hide
            daschl Michael Nitschinger added a comment -

            fix has been merged into master and will be available in 1.2.2.

            Show
            daschl Michael Nitschinger added a comment - fix has been merged into master and will be available in 1.2.2.

              People

              Assignee:
              daschl Michael Nitschinger
              Reporter:
              afds Deniss Afonin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty