Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36137

High parallel loads from Java SDK client can crash the Query service

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 6.5.0
    • 6.5.0
    • query
    • None

    Description

      Summary

      Under high load 12+ concurrent clients (on a single node Linux Debian 8 system) doing a read only query 1,000's of times in a stress test we see that the query service exits or dies.

      The test application or client does N1QL queries via N1QL via the Java SDK 2.2.7 for Mad-Hatter 6.5.0-4360 under a high thread count e.g. 12,24, 32 parallel concurrent queries will sometimes crash the query service. in query.log we will see messages in the Couchbase console log like the following (ns_log 000):

       

      Service 'query' exited with status 2. Restarting. Messages:
      github.com/couchbase/query/execution.(*base).runConsumer.func1()
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:661 +0x230

       

      And the Java client returns an Exception

       

      com.couchbase.client.core.RequestCancelledException Request cancelled in-flight.at com.couchbase.client.core.endpoint.AbstractGenericHandler.handleOutstandingOperations(AbstractGenericHandler.java:686)

       

      The query service does eventually recover after several seconds but the exit is unexpected and causes the cliet logic to fail.

      The problem occurs with both adhoc(true) or adhoc(false) in the prepared query

      The Java test program essential spawns multiple threads where each thread does 1000's of N1QL queries against a template

      SELECT
          r.distance, r.sourceairport, r.destinationairport, r.stops, s.day, s.flight, s.utc
      FROM
          `travel-sample` as r UNNEST r.schedule s
      WHERE
          s.day = DATE_PART_STR($arg_date_utc ,'dow') AND     
          s.utc >= $arg_begtime_utc AND 
          s.utc <= $arg_endtime_utc AND
          r.stops = 0 AND
          r.distance >= $arg_distance_km AND 
          r.sourceairport = $arg_faa

      via Java code like (running in each thread)

       

      N1qlQueryResult result = null;
      long[] durations = new long[loops];
      for(int i=0;i<loops;i++) {
          dist_km = dist_km + 0.00001 + id*0.01;
          JsonObject pVal = JsonObject.create()
              .put("arg_distance_km", dist_km).put("arg_date_utc",date_utc)
              .put("arg_begtime_utc",begtime_utc).put("arg_endtime_utc",endtime_utc)
              .put("$arg_faa", altFAA);
              
          String stmtfixed = qry_tempalte;
          N1qlQuery prepStmt = N1qlQuery.parameterized(stmtfixed + 
              " AND r.distance != 2*" + dist_km, pVal, N1qlParams.build().adhoc(true) );
          result = null;
          result = bucket.query(prepStmt); 
      }

      Note the parameter dist_km always changes to force a new unique prepared stmt.

      Full Details of Errors

      full message from Couchbase console log (ns_log 000)

       

      Service 'query' exited with status 2. Restarting. Messages:
      github.com/couchbase/query/execution.(*base).runConsumer.func1()
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:661 +0x230
      github.com/couchbase/query/util.(*Once).Do(0xc2bfb9f708, 0xc3ddfbe738)
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/util/sync.go:54 +0x4a
      github.com/couchbase/query/execution.(*base).runConsumer(0xc2bfb9f600, 0x212ee20, 0xc2bfb9f600, 0xc40eca3e00, 0x0, 0x0)
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:613 +0xa2
      github.com/couchbase/query/execution.(*Stream).RunOnce(0xc2bfb9f600, 0xc40eca3e00, 0x0, 0x0)
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/stream.go:53 +0x5c
      created by github.com/couchbase/query/execution.(*Sequence).RunOnce.func1
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/sequence.go:117 +0x407

       

      full message from Java SDK Exception

       

      com.couchbase.client.core.RequestCancelledException: Request cancelled in-flight.at com.couchbase.client.core.endpoint.AbstractGenericHandler.handleOutstandingOperations(AbstractGenericHandler.java:686)at com.couchbase.client.core.endpoint.AbstractGenericHandler.handlerRemoved(AbstractGenericHandler.java:667)at com.couchbase.client.core.endpoint.query.QueryHandlerV2.handlerRemoved(QueryHandlerV2.java:208)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.callHandlerRemoved0(DefaultChannelPipeline.java:626)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroyDown(DefaultChannelPipeline.java:878)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroyUp(DefaultChannelPipeline.java:844)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroy(DefaultChannelPipeline.java:836)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.access$700(DefaultChannelPipeline.java:44)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelUnregistered(DefaultChannelPipeline.java:1286)at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:176)at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:162)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:821)at com.couchbase.client.deps.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:776)at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:464)at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)at com.couchbase.client.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)at java.lang.Thread.run(Thread.java:748)

       

      Attachments (client exceptions from client app)

      • java1.log.txt
      • java2.log.txt
      • java3.log.txt 

      Attachments (sever exceptions from console log)

      • from_cb_console_AFTER_java1.log.txt
      • from_cb_console_AFTER_java2.log.txt
      • from_cb_console_AFTER_java3.log.txt 

      And finally a cb_collect after the above 3 failures has been uploaded to

      https://uploads.couchbase.com/jonstrabala/cbcollect_20190924_info_crash_query_java3_adhoc_true.zip

      Note, java2.log.txt includes an issues (initial Exception not part of this MB report that is difficult to reproduce).

       

       

      Attachments

        1. 20190925_crash_image.jpg
          20190925_crash_image.jpg
          47 kB
        2. 20190925_query.log
          551 kB
        3. from cb_console_AFTER_java1.log.txt
          8 kB
        4. from cb_console_AFTER_java2.log.txt
          10 kB
        5. from cb_console_AFTER_java3.log.txt
          12 kB
        6. go-routines
          346 kB
        7. java1.log.txt
          85 kB
        8. java2.log.txt
          81 kB
        9. java3.log.txt
          97 kB
        10. m1print.py
          2 kB
        11. profile003.svg
          72 kB
        12. SimpleStressorN1QL.java
          5 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            keshav Keshav Murthy
            jon.strabala Jon Strabala
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty