Details
-
Bug
-
Resolution: Fixed
-
Test Blocker
-
3.0.2
-
None
-
1
Description
|
I am working on new n1ql perf test and have been having issues with the python client n1ql query latencies and throughput. If I create a bucket with N items and then use 1 client to issue sequential n1ql queries, I see that one of the machines in the cluster has a linearly increasing number of connections in time_wait. The queries are like the following:
SELECT * FROM default:`bucket-1`.`scope-1`.`collection-1` USE KEYS['n1ql-000000000056']
These target can also be default scope and collection (if docs are in there).
Here is a graph showing this:
My client gets about 200 queries/sec and you can see that after 10 seconds there are around 2000 connections in time_wait. Even if I set a query timeout to 1000000 (which is 1 million nano seconds or 1 milliseconds) the connections are not killed.
This is the commit of the python client that I am using: 8d5820a297b35378d314eb399c041b214aa6a7db
This is the lcb logging for a single op:
45615ms [Ic41f35ec74693510] {11415} [TRACE] (http-io - L:401) <172.23.100.70:8091> GET http://172.23.100.70:8091/pools. Body=0 bytes |
45615ms [Ic41f35ec74693510] {11415} [DEBUG] (lcbio_mgr - L:481) <172.23.100.70:8091> (HE=0x32f1200) Creating new connection because none are available in the pool |
45616ms [Ic41f35ec74693510] {11415} [TRACE] (lcbio_mgr - L:393) <172.23.100.70:8091> (HE=0x32f1200) New pool entry: I=0x42d5b40 |
45616ms [Ic41f35ec74693510] {11415} [INFO] (connection - L:487) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Starting. Timeout=75000000us |
45616ms [Ic41f35ec74693510] {11415} [DEBUG] (connection - L:252) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Created new socket with FD=34 |
45616ms [Ic41f35ec74693510] {11415} [TRACE] (connection - L:352) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Scheduling I/O watcher for asynchronous connection completion. |
45616ms [Ic41f35ec74693510] {11415} [INFO] (connection - L:155) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Connected established |
45616ms [Ic41f35ec74693510] {11415} [DEBUG] (connection - L:108) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Successfully set TCP_NODELAY |
45616ms [Ic41f35ec74693510] {11415} [DEBUG] (connection - L:108) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Successfully set TCP_KEEPALIVE |
45616ms [Ic41f35ec74693510] {11415} [DEBUG] (lcbio_mgr - L:349) <172.23.100.70:8091> (HE=0x32f1200) Received result for I=0x42d5b40,C=(nil); E=0x0 |
45616ms [Ic41f35ec74693510] {11415} [DEBUG] (lcbio_mgr - L:309) <172.23.100.70:8091> (HE=0x32f1200) Assigning R=0x3d1bb40 SOCKET=0x4bd9610 |
45616ms [Ic41f35ec74693510] {11415} [DEBUG] (ioctx - L:95) <172.23.100.70:8091> (CTX=0x3b167a0,unknown) Pairing with SOCK=5e90741e7c6c856a |
45618ms [Ic41f35ec74693510] {11415} [DEBUG] (ioctx - L:140) <172.23.100.70:8091> (CTX=0x3b167a0,mgmt/capi) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1 |
45618ms [I32775f9223f27d7a] {11415} [TRACE] (http-io - L:401) <172.23.100.55:8093> POST http://172.23.100.55:8093/query/service. Body=240 bytes |
45618ms [I32775f9223f27d7a] {11415} [DEBUG] (lcbio_mgr - L:472) <172.23.100.55:8093> (HE=0x3667700) Found ready connection in pool. Reusing socket and not creating new connection |
45618ms [I32775f9223f27d7a] {11415} [DEBUG] (lcbio_mgr - L:309) <172.23.100.55:8093> (HE=0x3667700) Assigning R=0x370a5a0 SOCKET=0x34bfd10 |
45618ms [I32775f9223f27d7a] {11415} [DEBUG] (ioctx - L:95) <172.23.100.55:8093> (CTX=0x4d29740,unknown) Pairing with SOCK=7df24133bd864f92 |
45620ms [I32775f9223f27d7a] {11415} [DEBUG] (ioctx - L:140) <172.23.100.55:8093> (CTX=0x4d29740,mgmt/capi) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1 |
45620ms [I32775f9223f27d7a] {11415} [DEBUG] (lcbio_mgr - L:536) <172.23.100.55:8093> (HE=0x3667700) Placing socket back into the pool. I=0x3729840,C=0x34bfd10 |
The cluster has a data node and a query node among others. .70 is the data node and .55 is the query node. You can see from the logging that each query, lcb first creates a new connection to the data node .70 and then reuses a connection to the query node .55. The query connections appears to be reusing sockets but i see no indication that the data connection is reusing sockets, but is creating a new connection query and not destroying it after completion.