Uploaded image for project: 'Couchbase Python Client Library'
  1. Couchbase Python Client Library
  2. PYCBC-950

N1QL query connections remain open after completion

    XMLWordPrintable

Details

    • 1

    Description

       

      I am working on new n1ql perf test and have been having issues with the python client n1ql query latencies and throughput. If I create a bucket with N items and then use 1 client to issue sequential n1ql queries, I see that one of the machines in the cluster has a linearly increasing number of connections in time_wait. The queries are like the following:

      SELECT * FROM default:`bucket-1`.`scope-1`.`collection-1` USE KEYS['n1ql-000000000056']

      These target can also be default scope and collection (if docs are in there).

      Here is a graph showing this: 

      http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=iris_700-2351_access_227c#f087785f256442b0b8bdd8d9b9ba328a

      My client gets about 200 queries/sec and you can see that after 10 seconds there are around 2000 connections in time_wait. Even if I set a query timeout to 1000000 (which is 1 million nano seconds or 1 milliseconds) the connections are not killed.

       

      This is the commit of the python client that I am using: 8d5820a297b35378d314eb399c041b214aa6a7db

       

      This is the lcb logging for a single op:

      45615ms [Ic41f35ec74693510] {11415} [TRACE] (http-io - L:401) <172.23.100.70:8091> GET  http://172.23.100.70:8091/pools. Body=0 bytes 
      45615ms [Ic41f35ec74693510] {11415} [DEBUG] (lcbio_mgr - L:481) <172.23.100.70:8091> (HE=0x32f1200) Creating new connection because none are available in the pool 
      45616ms [Ic41f35ec74693510] {11415} [TRACE] (lcbio_mgr - L:393) <172.23.100.70:8091> (HE=0x32f1200) New pool entry: I=0x42d5b40 
      45616ms [Ic41f35ec74693510] {11415} [INFO] (connection - L:487) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Starting. Timeout=75000000us 
      45616ms [Ic41f35ec74693510] {11415} [DEBUG] (connection - L:252) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Created new socket with FD=34 
      45616ms [Ic41f35ec74693510] {11415} [TRACE] (connection - L:352) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Scheduling I/O watcher for asynchronous connection completion. 
      45616ms [Ic41f35ec74693510] {11415} [INFO] (connection - L:155) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Connected established 
      45616ms [Ic41f35ec74693510] {11415} [DEBUG] (connection - L:108) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Successfully set TCP_NODELAY 
      45616ms [Ic41f35ec74693510] {11415} [DEBUG] (connection - L:108) <172.23.100.70:8091> (SOCK=5e90741e7c6c856a) Successfully set TCP_KEEPALIVE 
      45616ms [Ic41f35ec74693510] {11415} [DEBUG] (lcbio_mgr - L:349) <172.23.100.70:8091> (HE=0x32f1200) Received result for I=0x42d5b40,C=(nil); E=0x0 
      45616ms [Ic41f35ec74693510] {11415} [DEBUG] (lcbio_mgr - L:309) <172.23.100.70:8091> (HE=0x32f1200) Assigning R=0x3d1bb40 SOCKET=0x4bd9610 
      45616ms [Ic41f35ec74693510] {11415} [DEBUG] (ioctx - L:95) <172.23.100.70:8091> (CTX=0x3b167a0,unknown) Pairing with SOCK=5e90741e7c6c856a 
      45618ms [Ic41f35ec74693510] {11415} [DEBUG] (ioctx - L:140) <172.23.100.70:8091> (CTX=0x3b167a0,mgmt/capi) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1 
      45618ms [I32775f9223f27d7a] {11415} [TRACE] (http-io - L:401) <172.23.100.55:8093> POST  http://172.23.100.55:8093/query/service. Body=240 bytes 
      45618ms [I32775f9223f27d7a] {11415} [DEBUG] (lcbio_mgr - L:472) <172.23.100.55:8093> (HE=0x3667700) Found ready connection in pool. Reusing socket and not creating new connection 
      45618ms [I32775f9223f27d7a] {11415} [DEBUG] (lcbio_mgr - L:309) <172.23.100.55:8093> (HE=0x3667700) Assigning R=0x370a5a0 SOCKET=0x34bfd10 
      45618ms [I32775f9223f27d7a] {11415} [DEBUG] (ioctx - L:95) <172.23.100.55:8093> (CTX=0x4d29740,unknown) Pairing with SOCK=7df24133bd864f92 
      45620ms [I32775f9223f27d7a] {11415} [DEBUG] (ioctx - L:140) <172.23.100.55:8093> (CTX=0x4d29740,mgmt/capi) Destroying context. Pending Writes=0, Entered=true, Socket Refcount=1 
      45620ms [I32775f9223f27d7a] {11415} [DEBUG] (lcbio_mgr - L:536) <172.23.100.55:8093> (HE=0x3667700) Placing socket back into the pool. I=0x3729840,C=0x34bfd10
      

      The cluster has a data node and a query node among others. .70 is the data node and .55 is the query node. You can see from the logging that each query, lcb first creates a new connection to the data node .70 and then reuses a connection to the query node .55. The query connections appears to be reusing sockets but i see no indication that the data connection is reusing sockets, but is creating a new connection query and not destroying it after completion.

      Attachments

        Activity

          People

            Ellis.Breen Ellis Breen
            korrigan.clark Korrigan Clark (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty