Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Test Blocker
Fix Version/s: 3.0.2
Affects Version/s: 3.0.2
Component/s: None
Labels:
- affects-cc-testing
- performance

Story Points:
1

Description

I am working on new n1ql perf test and have been having issues with the python client n1ql query latencies and throughput. If I create a bucket with N items and then use 1 client to issue sequential n1ql queries, I see that one of the machines in the cluster has a linearly increasing number of connections in time_wait. The queries are like the following:

SELECT * FROM default:`bucket-1`.`scope-1`.`collection-1` USE KEYS['n1ql-000000000056']

These target can also be default scope and collection (if docs are in there).

Here is a graph showing this:

http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-2351_access_227c#f087785f256442b0b8bdd8d9b9ba328a

My client gets about 200 queries/sec and you can see that after 10 seconds there are around 2000 connections in time_wait. Even if I set a query timeout to 1000000 (which is 1 million nano seconds or 1 milliseconds) the connections are not killed.

This is the commit of the python client that I am using: 8d5820a297b35378d314eb399c041b214aa6a7db

This is the lcb logging for a single op:

45615ms [Ic41f35ec74693510] {11415} [TRACE] (http-io - L:401) <172.23.100.70:8091> GET  http://172.23.100.70:8091/pools. Body=0 bytes

45615ms [Ic41f35ec74693510] {11415} [DEBUG] (lcbio_mgr - L:481) <172.23.100.70:8091> (HE=0x32f1200) Creating new connection because none are available in the pool

45616ms [Ic41f35ec74693510] {11415} [TRACE] (lcbio_mgr - L:393) <172.23.100.70:8091> (HE=0x32f1200) New pool entry: I=0x42d5b40

45616ms [Ic41f35ec74693510] {11415} [INFO] (connection - L:487) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Starting. Timeout=75000000us

45616ms [Ic41f35ec74693510] {11415} [DEBUG] (connection - L:252) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Created new socket with FD=34

45616ms [Ic41f35ec74693510] {11415} [TRACE] (connection - L:352) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Scheduling I/O watcher for asynchronous connection completion.

45616ms [Ic41f35ec74693510] {11415} [INFO] (connection - L:155) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Connected established

45616ms [Ic41f35ec74693510] {11415} [DEBUG] (connection - L:108) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Successfully set TCP_NODELAY

45616ms [Ic41f35ec74693510] {11415} [DEBUG] (connection - L:108) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Successfully set TCP_KEEPALIVE

45616ms [Ic41f35ec74693510] {11415} [DEBUG] (lcbio_mgr - L:349) <172.23.100.70:8091> (HE=0x32f1200) Received result for I=0x42d5b40,C=(nil); E=0x0

45616ms [Ic41f35ec74693510] {11415} [DEBUG] (lcbio_mgr - L:309) <172.23.100.70:8091> (HE=0x32f1200) Assigning R=0x3d1bb40 SOCKET=0x4bd9610

45616ms [Ic41f35ec74693510] {11415} [DEBUG] (ioctx - L:95) <172.23.100.70:8091> (CTX=0x3b167a0,unknown) Pairing with SOCK=5e90741e7c6c856a

45618ms [Ic41f35ec74693510] {11415} [DEBUG] (ioctx - L:140) <172.23.100.70:8091> (CTX=0x3b167a0,mgmt/capi) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1

45618ms [I32775f9223f27d7a] {11415} [TRACE] (http-io - L:401) <172.23.100.55:8093> POST  http://172.23.100.55:8093/query/service. Body=240 bytes

45618ms [I32775f9223f27d7a] {11415} [DEBUG] (lcbio_mgr - L:472) <172.23.100.55:8093> (HE=0x3667700) Found ready connection in pool. Reusing socket and not creating new connection

45618ms [I32775f9223f27d7a] {11415} [DEBUG] (lcbio_mgr - L:309) <172.23.100.55:8093> (HE=0x3667700) Assigning R=0x370a5a0 SOCKET=0x34bfd10

45618ms [I32775f9223f27d7a] {11415} [DEBUG] (ioctx - L:95) <172.23.100.55:8093> (CTX=0x4d29740,unknown) Pairing with SOCK=7df24133bd864f92

45620ms [I32775f9223f27d7a] {11415} [DEBUG] (ioctx - L:140) <172.23.100.55:8093> (CTX=0x4d29740,mgmt/capi) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1

45620ms [I32775f9223f27d7a] {11415} [DEBUG] (lcbio_mgr - L:536) <172.23.100.55:8093> (HE=0x3667700) Placing socket back into the pool. I=0x3729840,C=0x34bfd10

The cluster has a data node and a query node among others. .70 is the data node and .55 is the query node. You can see from the logging that each query, lcb first creates a new connection to the data node .70 and then reuses a connection to the query node .55. The query connections appears to be reusing sockets but i see no indication that the data connection is reusing sockets, but is creating a new connection query and not destroying it after completion.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

repro.py
1 kB
17/Jun/20 2:04 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Ellis Breen

Reporter:: Korrigan Clark (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 16/Jun/20 7:36 PM

Updated:: 02/Jul/20 2:06 PM

Resolved:: 02/Jul/20 2:06 PM

Gerrit Reviews

There are no open Gerrit changes

N1QL query connections remain open after completion

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty