Description
Based on a customer set up, it is possible in a very rare case for goxdcr to leak connections and end up taking up all the file descriptors of a system.
(Below is finding from the customer's case without the customer reference)
This is the code to clean up any failed REST calls, such as to ns_server:
http://src.couchbase.org/source/xref/6.5.1/goproj/src/github.com/couchbase/goxdcr/utils/utils.go#2168-2173
2167 transport, ok := client.Transport.(*http.Transport)
|
2168 if ok {
|
2169 if u.IsSeriousNetError(err) {
|
2170 logger.Debugf("Encountered %v, close all idle connections for this http client.\n", err)
|
2171 }
|
2172 transport.CloseIdleConnections()
|
2173 }
|
The suspect thing is that it is possible for transport not to be set. As is the case, for http calls (to local ns_server, we don’t encrypt), goxdcr doesn’t set the transport:
client = &http.Client{Timeout: base.DefaultHttpTimeout}
|
If transport is not set, then we’re not closing idle connection, and depending on golang to close it for us.
It just so happens that golang has had an issue https://github.com/golang/go/issues/28012 that showcases how TCP connection is not closed if the server doesn’t respond.
In particular, the user posted a code snip that is exactly how XDCR creates the http client. See https://github.com/golang/go/issues/28012#issuecomment-562290662 and he claims that the TCP connection isn’t closed.
This issue was fixed Dec 11, 2019 in golang 1.14, with the tile: “net/http: don't wait indefinitely in Transport for proxy CONNECT response”.
XDCR for 6.5.1 is shipped with golang 1.11 according to CMakefile. The golang issue I mentioned was filed with the OP using 1.11 as well.
Attachments
Issue Links
- backports to
-
MB-44182 [BP 6.6.2] - XDCR TCP connection leak when host does not respond and XDCR retries
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue relates to CBSE-9597 [ CBSE-9597 ] |
Affects Version/s | 6.5.1 [ 16622 ] | |
Affects Version/s | 6.6.2 [ 17103 ] | |
Description | Based on a customer set up, it is possible in a very rare case for goxdcr to leak connections and end up taking up all the file descriptors of a system. |
Based on a customer set up, it is possible in a very rare case for goxdcr to leak connections and end up taking up all the file descriptors of a system.
(Below is finding from the customer's case without the customer reference) This is the code to clean up any failed REST calls, such as to ns_server: [http://src.couchbase.org/source/xref/6.5.1/goproj/src/github.com/couchbase/goxdcr/utils/utils.go#2168-2173] {code}2167 transport, ok := client.Transport.(*http.Transport) 2168 if ok { 2169 if u.IsSeriousNetError(err) { 2170 logger.Debugf("Encountered %v, close all idle connections for this http client.\n", err) 2171 } 2172 transport.CloseIdleConnections() 2173 } {code} The suspect thing is that it is possible for transport not to be set. As is the case, for http calls (to local ns_server, we don’t encrypt), goxdcr doesn’t set the transport: [http://src.couchbase.org/source/xref/6.5.1/goproj/src/github.com/couchbase/goxdcr/utils/utils.go#2319] {code} client = &http.Client{Timeout: base.DefaultHttpTimeout} {code} If transport is not set, then we’re not closing idle connection, *and depending on golang to close it for us*. It just so happens that golang has had an issue [https://github.com/golang/go/issues/28012] that showcases how TCP connection is not closed if the server doesn’t respond. In particular, the user posted a code snip that is exactly how XDCR creates the http client. See [https://github.com/golang/go/issues/28012#issuecomment-562290662] and he claims that the TCP connection isn’t closed. This issue was fixed Dec 11, 2019 in golang 1.14, with the tile: “net/http: don't wait indefinitely in Transport for proxy CONNECT response”. XDCR for 6.5.1 is shipped with golang 1.11 according to CMakefile. The golang issue I mentioned was filed with the OP using 1.11 as well. This most likely explains why XDCR eats up all the FD’s, but this doesn’t yet explain how the system’s networking got XDCR into this cycle in the first place. |
Issue Type | Task [ 3 ] | Bug [ 1 ] |
Summary | XDCR socket leak investigation | XDCR TCP connection leak when host does not respond and XDCR retries |
Description |
Based on a customer set up, it is possible in a very rare case for goxdcr to leak connections and end up taking up all the file descriptors of a system.
(Below is finding from the customer's case without the customer reference) This is the code to clean up any failed REST calls, such as to ns_server: [http://src.couchbase.org/source/xref/6.5.1/goproj/src/github.com/couchbase/goxdcr/utils/utils.go#2168-2173] {code}2167 transport, ok := client.Transport.(*http.Transport) 2168 if ok { 2169 if u.IsSeriousNetError(err) { 2170 logger.Debugf("Encountered %v, close all idle connections for this http client.\n", err) 2171 } 2172 transport.CloseIdleConnections() 2173 } {code} The suspect thing is that it is possible for transport not to be set. As is the case, for http calls (to local ns_server, we don’t encrypt), goxdcr doesn’t set the transport: [http://src.couchbase.org/source/xref/6.5.1/goproj/src/github.com/couchbase/goxdcr/utils/utils.go#2319] {code} client = &http.Client{Timeout: base.DefaultHttpTimeout} {code} If transport is not set, then we’re not closing idle connection, *and depending on golang to close it for us*. It just so happens that golang has had an issue [https://github.com/golang/go/issues/28012] that showcases how TCP connection is not closed if the server doesn’t respond. In particular, the user posted a code snip that is exactly how XDCR creates the http client. See [https://github.com/golang/go/issues/28012#issuecomment-562290662] and he claims that the TCP connection isn’t closed. This issue was fixed Dec 11, 2019 in golang 1.14, with the tile: “net/http: don't wait indefinitely in Transport for proxy CONNECT response”. XDCR for 6.5.1 is shipped with golang 1.11 according to CMakefile. The golang issue I mentioned was filed with the OP using 1.11 as well. This most likely explains why XDCR eats up all the FD’s, but this doesn’t yet explain how the system’s networking got XDCR into this cycle in the first place. |
Based on a customer set up, it is possible in a very rare case for goxdcr to leak connections and end up taking up all the file descriptors of a system.
(Below is finding from the customer's case without the customer reference) This is the code to clean up any failed REST calls, such as to ns_server: [http://src.couchbase.org/source/xref/6.5.1/goproj/src/github.com/couchbase/goxdcr/utils/utils.go#2168-2173] {code}2167 transport, ok := client.Transport.(*http.Transport) 2168 if ok { 2169 if u.IsSeriousNetError(err) { 2170 logger.Debugf("Encountered %v, close all idle connections for this http client.\n", err) 2171 } 2172 transport.CloseIdleConnections() 2173 } {code} The suspect thing is that it is possible for transport not to be set. As is the case, for http calls (to local ns_server, we don’t encrypt), goxdcr doesn’t set the transport: [http://src.couchbase.org/source/xref/6.5.1/goproj/src/github.com/couchbase/goxdcr/utils/utils.go#2319] {code} client = &http.Client{Timeout: base.DefaultHttpTimeout} {code} If transport is not set, then we’re not closing idle connection, *and depending on golang to close it for us*. It just so happens that golang has had an issue [https://github.com/golang/go/issues/28012] that showcases how TCP connection is not closed if the server doesn’t respond. In particular, the user posted a code snip that is exactly how XDCR creates the http client. See [https://github.com/golang/go/issues/28012#issuecomment-562290662] and he claims that the TCP connection isn’t closed. This issue was fixed Dec 11, 2019 in golang 1.14, with the tile: “net/http: don't wait indefinitely in Transport for proxy CONNECT response”. XDCR for 6.5.1 is shipped with golang 1.11 according to CMakefile. The golang issue I mentioned was filed with the OP using 1.11 as well. |
Resolution | Fixed [ 1 ] | |
Status | Open [ 1 ] | Closed [ 6 ] |
Fix Version/s | 7.0.0 [ 17233 ] |
Fix Version/s | Cheshire-Cat [ 15915 ] |