Details
-
Bug
-
Resolution: Done
-
Major
-
6.5.0
-
None
-
SERVER (one node)
# /opt/couchbase/bin/couchbase-server -version
Couchbase Server 6.5.0-4360 (EE)
# uname -a
Linux couch01 3.16.0-10-amd64 #1 SMP Debian 3.16.70-1 (2019-07-22) x86_64 GNU/Linux
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 8
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 86
Model name: Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz
Stepping: 3
CPU MHz: 2098.406
BogoMIPS: 4201.55
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0-7
# free
total used free shared buffers cached
Mem: 24722744 11390780 13331964 6412 43644 1104736
-/+ buffers/cache: 10242400 14480344
Swap: 477180 476948 232
CLIENT (on a Windows 7 PC)
Java 8
java-client-2.7.7..jar
core-io.1.7.7.jar
SERVER (one node) # /opt/couchbase/bin/couchbase-server -version Couchbase Server 6.5.0-4360 (EE) # uname -a Linux couch01 3.16.0-10-amd64 #1 SMP Debian 3.16.70-1 (2019-07-22) x86_64 GNU/Linux # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 8 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 86 Model name: Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz Stepping: 3 CPU MHz: 2098.406 BogoMIPS: 4201.55 L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0-7 # free total used free shared buffers cached Mem: 24722744 11390780 13331964 6412 43644 1104736 -/+ buffers/cache: 10242400 14480344 Swap: 477180 476948 232 CLIENT (on a Windows 7 PC) Java 8 java-client-2.7.7..jar core-io.1.7.7.jar
Description
Summary
Under high load 12+ concurrent clients (on a single node Linux Debian 8 system) doing a read only query 1,000's of times in a stress test we see that the query service exits or dies.
The test application or client does N1QL queries via N1QL via the Java SDK 2.2.7 for Mad-Hatter 6.5.0-4360 under a high thread count e.g. 12,24, 32 parallel concurrent queries will sometimes crash the query service. in query.log we will see messages in the Couchbase console log like the following (ns_log 000):
Service 'query' exited with status 2. Restarting. Messages: |
github.com/couchbase/query/execution.(*base).runConsumer.func1()
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:661 +0x230 |
And the Java client returns an Exception
com.couchbase.client.core.RequestCancelledException Request cancelled in-flight.at com.couchbase.client.core.endpoint.AbstractGenericHandler.handleOutstandingOperations(AbstractGenericHandler.java:686) |
The query service does eventually recover after several seconds but the exit is unexpected and causes the cliet logic to fail.
The problem occurs with both adhoc(true) or adhoc(false) in the prepared query
The Java test program essential spawns multiple threads where each thread does 1000's of N1QL queries against a template
SELECT
|
r.distance, r.sourceairport, r.destinationairport, r.stops, s.day, s.flight, s.utc
|
FROM
|
`travel-sample` as r UNNEST r.schedule s
|
WHERE
|
s.day = DATE_PART_STR($arg_date_utc ,'dow') AND |
s.utc >= $arg_begtime_utc AND
|
s.utc <= $arg_endtime_utc AND
|
r.stops = 0 AND |
r.distance >= $arg_distance_km AND
|
r.sourceairport = $arg_faa
|
via Java code like (running in each thread)
N1qlQueryResult result = null; |
long[] durations = new long[loops]; |
for(int i=0;i<loops;i++) { |
dist_km = dist_km + 0.00001 + id*0.01; |
JsonObject pVal = JsonObject.create()
|
.put("arg_distance_km", dist_km).put("arg_date_utc",date_utc) |
.put("arg_begtime_utc",begtime_utc).put("arg_endtime_utc",endtime_utc) |
.put("$arg_faa", altFAA); |
|
String stmtfixed = qry_tempalte;
|
N1qlQuery prepStmt = N1qlQuery.parameterized(stmtfixed +
|
" AND r.distance != 2*" + dist_km, pVal, N1qlParams.build().adhoc(true) ); |
result = null; |
result = bucket.query(prepStmt);
|
}
|
Note the parameter dist_km always changes to force a new unique prepared stmt.
Full Details of Errors
full message from Couchbase console log (ns_log 000)
Service 'query' exited with status 2. Restarting. Messages: |
github.com/couchbase/query/execution.(*base).runConsumer.func1()
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:661 +0x230 |
github.com/couchbase/query/util.(*Once).Do(0xc2bfb9f708, 0xc3ddfbe738) |
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/util/sync.go:54 +0x4a |
github.com/couchbase/query/execution.(*base).runConsumer(0xc2bfb9f600, 0x212ee20, 0xc2bfb9f600, 0xc40eca3e00, 0x0, 0x0) |
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/base.go:613 +0xa2 |
github.com/couchbase/query/execution.(*Stream).RunOnce(0xc2bfb9f600, 0xc40eca3e00, 0x0, 0x0) |
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/stream.go:53 +0x5c |
created by github.com/couchbase/query/execution.(*Sequence).RunOnce.func1
|
/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/query/execution/sequence.go:117 +0x407 |
full message from Java SDK Exception
com.couchbase.client.core.RequestCancelledException: Request cancelled in-flight.at com.couchbase.client.core.endpoint.AbstractGenericHandler.handleOutstandingOperations(AbstractGenericHandler.java:686)at com.couchbase.client.core.endpoint.AbstractGenericHandler.handlerRemoved(AbstractGenericHandler.java:667)at com.couchbase.client.core.endpoint.query.QueryHandlerV2.handlerRemoved(QueryHandlerV2.java:208)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.callHandlerRemoved0(DefaultChannelPipeline.java:626)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroyDown(DefaultChannelPipeline.java:878)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroyUp(DefaultChannelPipeline.java:844)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.destroy(DefaultChannelPipeline.java:836)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.access$700(DefaultChannelPipeline.java:44)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline$HeadContext.channelUnregistered(DefaultChannelPipeline.java:1286)at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:176)at com.couchbase.client.deps.io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:162)at com.couchbase.client.deps.io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:821)at com.couchbase.client.deps.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:776)at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399)at com.couchbase.client.deps.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:464)at com.couchbase.client.deps.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)at com.couchbase.client.deps.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)at java.lang.Thread.run(Thread.java:748) |
Attachments (client exceptions from client app)
- java1.log.txt
- java2.log.txt
- java3.log.txt
Attachments (sever exceptions from console log)
- from_cb_console_AFTER_java1.log.txt
- from_cb_console_AFTER_java2.log.txt
- from_cb_console_AFTER_java3.log.txt
And finally a cb_collect after the above 3 failures has been uploaded to
https://uploads.couchbase.com/jonstrabala/cbcollect_20190924_info_crash_query_java3_adhoc_true.zip
Note, java2.log.txt includes an issues (initial Exception not part of this MB report that is difficult to reproduce).