Uploaded image for project: 'Couchbase Java Client'
  1. Couchbase Java Client
  2. JCBC-368

Deadlock in BucketMonitor.startMonitor() on CountDownLatch when channel creation fails.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.1.9, 1.2
    • Fix Version/s: 1.2.2
    • Component/s: Core
    • Security Level: Public
    • Labels:
      None

      Description

      Deadlock in BucketMonitor.startMonitor() on CountDownLatch when channel creation fails.

      Before deadlocking it also logs following exception (which by itself is another bug):

      java.lang.IllegalStateException: An Executor cannot be shut down from the thread acquired from itself. Please make sure you are not calling releaseExternalResources() from an I/O worker thread.
      at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:73)
      at org.jboss.netty.util.internal.ExecutorUtil.terminate(ExecutorUtil.java:49)
      at org.jboss.netty.channel.socket.nio.NioClientSocketChannelFactory.releaseExternalResources(NioClientSocketChannelFactory.java:180)
      at org.jboss.netty.bootstrap.Bootstrap.releaseExternalResources(Bootstrap.java:319)
      at com.couchbase.client.vbucket.BucketMonitor$1.operationComplete(BucketMonitor.java:193)
      at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:428)
      at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:419)
      at org.jboss.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:381)
      at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processConnectTimeout(NioClientSocketPipelineSink.java:394)
      at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:289)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:724)

        Issue Links

        # Subject Project Status CR V
        For Gerrit Dashboard: &For+JCBC-368=message:JCBC-368

          Activity

          Hide
          daschl Michael Nitschinger added a comment -

          Hi, Thanks for reporting.

          can you give me more details about your env? Particulary, which sdk version do you run and on which platform/os?

          Show
          daschl Michael Nitschinger added a comment - Hi, Thanks for reporting. can you give me more details about your env? Particulary, which sdk version do you run and on which platform/os?
          Hide
          afds Deniss Afonin added a comment -

          Yeah, sorry for missing that. Here it is:

          couchbase-client-1.2.0

          running with

          java version "1.7.0_40"
          Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
          Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)

          on OS X 10.8.5

          Show
          afds Deniss Afonin added a comment - Yeah, sorry for missing that. Here it is: couchbase-client-1.2.0 running with java version "1.7.0_40" Java(TM) SE Runtime Environment (build 1.7.0_40-b43) Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode) on OS X 10.8.5
          Hide
          daschl Michael Nitschinger added a comment -

          I see what's going on there, but I'd like to add a test to reproduce. How are you running into this?

          Show
          daschl Michael Nitschinger added a comment - I see what's going on there, but I'd like to add a test to reproduce. How are you running into this?
          Hide
          afds Deniss Afonin added a comment -

          I can't always reproduce that, this happens from time to time.

          Show
          afds Deniss Afonin added a comment - I can't always reproduce that, this happens from time to time.
          Hide
          daschl Michael Nitschinger added a comment -

          I see, thanks for reporting it. Look forward to a proper fix in the next bugfix release.

          Show
          daschl Michael Nitschinger added a comment - I see, thanks for reporting it. Look forward to a proper fix in the next bugfix release.
          Hide
          daschl Michael Nitschinger added a comment -

          I think I fixed it, but in any case something went wrong during connection to the streaming connection.

          Does the error is expected to happen in your environment or was it considered stable? I wonder in which circumstances you saw that. Even with my fix, its just shutting down properly but still the streaming conn attachment won't succeed.

          Show
          daschl Michael Nitschinger added a comment - I think I fixed it, but in any case something went wrong during connection to the streaming connection. Does the error is expected to happen in your environment or was it considered stable? I wonder in which circumstances you saw that. Even with my fix, its just shutting down properly but still the streaming conn attachment won't succeed.
          Hide
          afds Deniss Afonin added a comment -

          Well, this happens for us only when couchbase server is not in local network i.e. we are connecting via internet with ~100ms latency and that internet connection might not be that stable.

          Issue JCBC-326 might also be related to this.

          Show
          afds Deniss Afonin added a comment - Well, this happens for us only when couchbase server is not in local network i.e. we are connecting via internet with ~100ms latency and that internet connection might not be that stable. Issue JCBC-326 might also be related to this.
          Hide
          daschl Michael Nitschinger added a comment -

          Yes that's what I assumed. So the thing is, in general we don't recommend running your app servers and db servers across the internet (they should be in the same datacenter, especially if you care about latency).

          But this shouldn't be a limitation of the client itself. So I have a bugfix for this upcoming, but I'll also see what I can do to add some more retry logic to try different nodes from the list so if one fails we eventually connect to another one.

          Show
          daschl Michael Nitschinger added a comment - Yes that's what I assumed. So the thing is, in general we don't recommend running your app servers and db servers across the internet (they should be in the same datacenter, especially if you care about latency). But this shouldn't be a limitation of the client itself. So I have a bugfix for this upcoming, but I'll also see what I can do to add some more retry logic to try different nodes from the list so if one fails we eventually connect to another one.
          Hide
          daschl Michael Nitschinger added a comment -

          fix has been merged into master and will be available in 1.2.2.

          Show
          daschl Michael Nitschinger added a comment - fix has been merged into master and will be available in 1.2.2.

            People

            • Assignee:
              daschl Michael Nitschinger
              Reporter:
              afds Deniss Afonin
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes