Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44182

[BP 6.6.2] - XDCR TCP connection leak when host does not respond and XDCR retries



    • 1


      Based on a customer set up, it is possible in a very rare case for goxdcr to leak connections and end up taking up all the file descriptors of a system. 

      (Below is finding from the customer's case without the customer reference)
      This is the code to clean up any failed REST calls, such as to ns_server:

      2167 			transport, ok := client.Transport.(*http.Transport)
      2168 			if ok {
      2169 				if u.IsSeriousNetError(err) {
      2170 					logger.Debugf("Encountered %v, close all idle connections for this http client.\n", err)
      2171 				}
      2172 				transport.CloseIdleConnections()
      2173 			}

      The suspect thing is that it is possible for transport not to be set. As is the case, for http calls (to local ns_server, we don’t encrypt), goxdcr doesn’t set the transport:


       		client = &http.Client{Timeout: base.DefaultHttpTimeout}

      If transport is not set, then we’re not closing idle connection, and depending on golang to close it for us.

      It just so happens that golang has had an issue https://github.com/golang/go/issues/28012 that showcases how TCP connection is not closed if the server doesn’t respond.
      In particular, the user posted a code snip that is exactly how XDCR creates the http client. See https://github.com/golang/go/issues/28012#issuecomment-562290662 and he claims that the TCP connection isn’t closed.

      This issue was fixed Dec 11, 2019 in golang 1.14, with the tile: “net/http: don't wait indefinitely in Transport for proxy CONNECT response”.

      XDCR for 6.5.1 is shipped with golang 1.11 according to CMakefile. The golang issue I mentioned was filed with the OP using 1.11 as well.


        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.


            neil.huang Neil Huang created issue -
            neil.huang Neil Huang made changes -
            Field Original Value New Value
            Link This issue Clones MB-44128 [ MB-44128 ]
            neil.huang Neil Huang made changes -
            Link This issue relates to CBSE-9597 [ CBSE-9597 ]
            neil.huang Neil Huang made changes -
            Fix Version/s 6.6.2 [ 17103 ]
            Fix Version/s Cheshire-Cat [ 15915 ]
            wayne Wayne Siu made changes -
            Link This issue blocks MB-43310 [ MB-43310 ]
            wayne Wayne Siu made changes -
            Labels approved-for-6.6.2
            wayne Wayne Siu made changes -
            Link This issue is a backport of MB-44128 [ MB-44128 ]
            wayne Wayne Siu made changes -
            Link This issue Clones MB-44128 [ MB-44128 ]
            neil.huang Neil Huang made changes -
            VERIFICATION STEPS Please run through some of the checkpointing test cases, and any network-outage related automated tests if you have any
            Assignee Neil Huang [ neil.huang ] Pavithra Mahamani [ pavithra.mahamani ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            arunkumar Arunkumar Senthilnathan (Inactive) made changes -
            Labels approved-for-6.6.2 approved-for-6.6.2 releasenote
            pavithra.mahamani Pavithra Mahamani (Inactive) made changes -
            Status Resolved [ 5 ] Closed [ 6 ]


              pavithra.mahamani Pavithra Mahamani (Inactive)
              neil.huang Neil Huang
              0 Vote for this issue
              2 Start watching this issue



                Gerrit Reviews

                  There are no open Gerrit changes