Uploaded image for project: 'Couchbase Lite'
  1. Couchbase Lite
  2. CBL-911

Couchbase Lite Java replication hangs when using DEBUG console + file logging on Windows

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • 3.0
    • 2.7.1
    • LiteCore
    • Security Level: Public
    • None
    • Jim 21
    • 5

    Description

      Summary

      When using console-based logging alongside file-based logging at a low level (such as DEBUG) in a Couchbase Lite Java application on Windows, the replication eventually gets stuck due to waiting for a logging call that never returns.

      Steps to reproduce

      1. Create a simple Java app (source code attached as Main.java ) designed to synchronize a non-trivial number of documents (my tests reproduce after approximately 50000 documents) from a remote Sync Gateway, ideally with a good network to ensure network isn't the bottleneck.
      Ideally add some way to keep track of replication progress (via the replicator status).
      2. Enable console logging at debug-level and file-based logging:

              final File path = new File("C:\\Users\\Administrator\\IdeaProjects\\logging-hang\\logs");
              var logConfig = new LogFileConfiguration(path.toString());
              logConfig.setMaxRotateCount(100);
              logConfig.setMaxSize(5 * 1024);
              Database.log.getFile().setConfig(logConfig);
              Database.log.getFile().setLevel(LogLevel.DEBUG);
              Database.log.getConsole().setLevel(LogLevel.DEBUG);
              Database.log.getConsole().setDomains(EnumSet.of(LogDomain.ALL));
      


      3. Run the app on Windows.

      Expected Result

      The app synchronizes all of the documents as expected.

      Actual Result

      The app eventually stops logging from Couchbase Lite and the replication makes no further progress.

      Analysis

      I grabbed the thread dump via Jstack when the app got into this 'stuck' state (thread_dump.txt) and can see the following:

      "OkHttp http://172.31.8.21:4984/..." #19 prio=5 os_prio=0 cpu=4890.63ms elapsed=271.34s tid=0x0000018697f46800 nid=0x15e4 runnable  [0x00000074e14fe000]
         java.lang.Thread.State: RUNNABLE
              at com.couchbase.lite.internal.core.C4Log.log(Native Method)
              at com.couchbase.lite.FileLogger.log(FileLogger.java:55)
              at com.couchbase.lite.internal.support.Log.sendToLoggers(Log.java:415)
              at com.couchbase.lite.internal.support.Log.log(Log.java:394)
              at com.couchbase.lite.internal.support.Log.v(Log.java:146)
              at com.couchbase.lite.internal.replicator.AbstractCBLWebSocket$CBLWebSocketListener.onMessage(AbstractCBLWebSocket.java:167)
              at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:314)
              at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:224)
              at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
              at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:265)
              at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:204)
              at okhttp3.RealCall$AsyncCall.execute(RealCall.java:153)
              at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@14.0.1/ThreadPoolExecutor.java:1130)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@14.0.1/ThreadPoolExecutor.java:630)
              at java.lang.Thread.run(java.base@14.0.1/Thread.java:832)
      

      As you can see in this, the reason that replication is making no further progress is because the thread responsible from consuming new messages from the websocket is stuck in Lite Core's logging method.
      I tried to get more detail about what Lite Core is doing at this stage but I guess it wasn't compiled with debug symbols as the stacktrace in windbg.exe was just addresses.

      Trying to understand the problem a bit better, I made a few adjustments and found that this does not occur in the following situations:

      • When you use VERBOSE level instead of DEBUG-level, implying an issue specifically with DEBUG level or just the volume of logging generated.
      • When you use only console-based or only file-based logging, implying there is a potential deadlock caused by having 2 outputs.
      • When the application is running on Linux, implying this is Windows-specific.

      Workaround

      Do not enable debug logging for console logging.

      Attachments

        1. LiteCore.pdb
          43.50 MB
        2. Main.java
          3 kB
        3. thread_dump.txt
          14 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            matt.carabine Matt Carabine (Inactive)
            matt.carabine Matt Carabine (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty