Uploaded image for project: 'Couchbase Lite'
  1. Couchbase Lite
  2. CBL-2405

Hung in call to c4socket_closed

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.8.5
    • 3.0, 2.8.7
    • LiteCore
    • Security Level: Public
    • None

    Description

      A call to:

      c4socket_closed
      

      from within a Core callback to:

      C4ReplicatorParameters.C4SocketFactory.socket_requestClose
      

      appears to cause the calling thread to block indefinitely, under some condition.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          jimb Jim Borden added a comment -

          I present, as evidence to the contrary, an existing test which does exactly this:

          https://github.com/couchbase/couchbase-lite-core/blob/18f3368186875461c49a559705f9a791e31c6fb9/Replicator/tests/ReplicatorAPITest.cc#L590-L594

          So something else is likely going on

          jimb Jim Borden added a comment - I present, as evidence to the contrary, an existing test which does exactly this: https://github.com/couchbase/couchbase-lite-core/blob/18f3368186875461c49a559705f9a791e31c6fb9/Replicator/tests/ReplicatorAPITest.cc#L590-L594 So something else is likely going on
          blake.meike Blake Meike added a comment -

          I present, as evidence, the following:

          (This issue is fairly simple to reproduce. I've included some things that I think might be useful, here, but I can probably get whatever you need, if this is insufficient.)

          This code will log a message on the return from the native method `closed`:

              protected final void closed(int errorDomain, int errorCode, String message) {
                  final long handle = getPeerUnchecked();
                  Log.d(LOG_DOMAIN, "C4Socket.closed @%d: %d", handle, errorCode);
                  if (handle == 0) { return; }
                  closed(handle, errorDomain, errorCode, message);
                  Log.d(LOG_DOMAIN, "C4Socket.close complete");
              }
          

          Here is the native method, FWIW:

          JNIEXPORT void JNICALL
          Java_com_couchbase_lite_internal_core_C4Socket_closed(
                  JNIEnv *env,
                  jclass ignore,
                  jlong jSocket,
                  jint domain,
                  jint code,
                  jstring message) {
              auto socket = (C4Socket *) jSocket;
              jstringSlice sliceMessage(env, message);
              C4Error error = c4error_make((C4ErrorDomain) domain, code, sliceMessage);
              c4socket_closed(socket, error);
          }
          

          Here is a bit of the log:

          Sep 29, 2021, 2:02:12 PM 57(Thread-30) E CouchbaseLite/NETWORK:{C4SocketImpl#7} No response received after 15 sec -- disconnecting
          Sep 29, 2021, 2:02:12 PM 58(Thread-31) D CouchbaseLite/NETWORK:C4Socket.requestClose @140678238411008: AbstractCBLWebSocket{ws://127.0.0.1:4984/todo/_blipsync} #504(Timed out)
          Sep 29, 2021, 2:02:12 PM 58(Thread-31) V CouchbaseLite/NETWORK:AbstractCBLWebSocket{ws://127.0.0.1:4984/todo/_blipsync}:Core request close: null 504
          Sep 29, 2021, 2:02:12 PM 58(Thread-31) D CouchbaseLite/DATABASE:StateMachine@0x3c85bacb: transition CONNECTING => CLOSING
          Sep 29, 2021, 2:02:12 PM 58(Thread-31) D CouchbaseLite/DATABASE:StateMachine@0x3c85bacb: transition CLOSING => CLOSED
          Sep 29, 2021, 2:02:12 PM 58(Thread-31) I CouchbaseLite/NETWORK:WebSocket CLOSED: 504(Timed out)
          Sep 29, 2021, 2:02:12 PM 58(Thread-31) D CouchbaseLite/NETWORK:C4Socket.closed @140678238411008: 504
          

          As you can see, Thread-31 enters the native method `closed` but does not leave.

          Thread-31 is in the native method "closed" an is not waiting on any java mutex:

          "Thread-31" #58 prio=5 os_prio=31 cpu=445.63ms elapsed=7.77s tid=0x00007ff235294000 nid=0x370b runnable  [0x0000700007ef2000]
             java.lang.Thread.State: RUNNABLE
          	at com.couchbase.lite.internal.core.C4Socket.closed(Native Method)
          	at com.couchbase.lite.internal.core.C4Socket.closed(C4Socket.java:232)
          	at com.couchbase.lite.internal.replicator.AbstractCBLWebSocket.didClose(AbstractCBLWebSocket.java:601)
          	at com.couchbase.lite.internal.replicator.AbstractCBLWebSocket.requestClose(AbstractCBLWebSocket.java:420)
          	- locked <0x000000070d559d30> (a com.couchbase.lite.internal.utils.StateMachine)
          	at com.couchbase.lite.internal.core.C4Socket.requestClose(C4Socket.java:141)
          
          

          It does hold the mutex for the variable named `state` (an instance of StateMachine) which prevents any other thread from making forward progress through the websocket code.

          blake.meike Blake Meike added a comment - I present, as evidence, the following: (This issue is fairly simple to reproduce. I've included some things that I think might be useful, here, but I can probably get whatever you need, if this is insufficient.) This code will log a message on the return from the native method `closed`: protected final void closed(int errorDomain, int errorCode, String message) { final long handle = getPeerUnchecked(); Log.d(LOG_DOMAIN, "C4Socket.closed @%d: %d", handle, errorCode); if (handle == 0) { return; } closed(handle, errorDomain, errorCode, message); Log.d(LOG_DOMAIN, "C4Socket.close complete"); } Here is the native method, FWIW: JNIEXPORT void JNICALL Java_com_couchbase_lite_internal_core_C4Socket_closed( JNIEnv *env, jclass ignore, jlong jSocket, jint domain, jint code, jstring message) { auto socket = (C4Socket *) jSocket; jstringSlice sliceMessage(env, message); C4Error error = c4error_make((C4ErrorDomain) domain, code, sliceMessage); c4socket_closed(socket, error); } Here is a bit of the log: Sep 29, 2021, 2:02:12 PM 57(Thread-30) E CouchbaseLite/NETWORK:{C4SocketImpl#7} No response received after 15 sec -- disconnecting Sep 29, 2021, 2:02:12 PM 58(Thread-31) D CouchbaseLite/NETWORK:C4Socket.requestClose @140678238411008: AbstractCBLWebSocket{ws://127.0.0.1:4984/todo/_blipsync} #504(Timed out) Sep 29, 2021, 2:02:12 PM 58(Thread-31) V CouchbaseLite/NETWORK:AbstractCBLWebSocket{ws://127.0.0.1:4984/todo/_blipsync}:Core request close: null 504 Sep 29, 2021, 2:02:12 PM 58(Thread-31) D CouchbaseLite/DATABASE:StateMachine@0x3c85bacb: transition CONNECTING => CLOSING Sep 29, 2021, 2:02:12 PM 58(Thread-31) D CouchbaseLite/DATABASE:StateMachine@0x3c85bacb: transition CLOSING => CLOSED Sep 29, 2021, 2:02:12 PM 58(Thread-31) I CouchbaseLite/NETWORK:WebSocket CLOSED: 504(Timed out) Sep 29, 2021, 2:02:12 PM 58(Thread-31) D CouchbaseLite/NETWORK:C4Socket.closed @140678238411008: 504 As you can see, Thread-31 enters the native method `closed` but does not leave. Thread-31 is in the native method "closed" an is not waiting on any java mutex: "Thread-31" #58 prio=5 os_prio=31 cpu=445.63ms elapsed=7.77s tid=0x00007ff235294000 nid=0x370b runnable [0x0000700007ef2000] java.lang.Thread.State: RUNNABLE at com.couchbase.lite.internal.core.C4Socket.closed(Native Method) at com.couchbase.lite.internal.core.C4Socket.closed(C4Socket.java:232) at com.couchbase.lite.internal.replicator.AbstractCBLWebSocket.didClose(AbstractCBLWebSocket.java:601) at com.couchbase.lite.internal.replicator.AbstractCBLWebSocket.requestClose(AbstractCBLWebSocket.java:420) - locked <0x000000070d559d30> (a com.couchbase.lite.internal.utils.StateMachine) at com.couchbase.lite.internal.core.C4Socket.requestClose(C4Socket.java:141) It does hold the mutex for the variable named `state` (an instance of StateMachine) which prevents any other thread from making forward progress through the websocket code.
          blake.meike Blake Meike added a comment -

          After some learning to use the native debugger (thanks Jim Borden!) I was able to get a stack trace of the hung thread. Jim assures me that this is not a deadlock, but some kind of timer issue. I've edited the title of this bug to make it more correct and added the screen shot of the stack trace to the enclosures.

          blake.meike Blake Meike added a comment - After some learning to use the native debugger (thanks Jim Borden !) I was able to get a stack trace of the hung thread. Jim assures me that this is not a deadlock, but some kind of timer issue. I've edited the title of this bug to make it more correct and added the screen shot of the stack trace to the enclosures.

          People

            jimb Jim Borden
            blake.meike Blake Meike
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty