Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3, 7.2.1, 7.1.5, 7.2.4, 7.2.2, 7.2.3
-
Untriaged
-
0
-
No
Description
Normally, if DCP nozzles are started and then stopped in sequence, there shouldn’t be any UprFeed leaks.
As an example, in a normal pipeline with 2 source nozzles, after successions of pause and resume, we only see
2 @ 0x10004006e 0x1000503e5 0x10033c46d 0x1000733c1
|
# 0x10033c46c github.com/couchbase/gomemcached/client.(*UprFeed).sendCommands+0xac /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/gomemcached/client/upr_feed.go:346
|
With the following hard-coded hacks to induce race:
diff --git a/parts/dcp_nozzle.go b/parts/dcp_nozzle.go
|
index f6ecbd0d..096b5b2f 100644
|
--- a/parts/dcp_nozzle.go
|
+++ b/parts/dcp_nozzle.go
|
@@ -747,7 +747,10 @@ func (dcp *DcpNozzle) initialize(settings metadata.ReplicationSettingsMap) (err
|
dcp.Logger().Infof("%v with OSO mode requested", dcp.Id())
|
}
|
|
+ fmt.Printf("NEIL DEBUG sleeping 40 seconds before initializing UPR feed\n")
|
+ time.Sleep(40 * time.Second)
|
err = dcp.initializeUprFeed()
|
+ fmt.Printf("NEIL DEBUG UPR feed now opened\n")
|
if err != nil {
|
return err
|
}
|
@@ -845,7 +848,9 @@ func (dcp *DcpNozzle) Start(settings metadata.ReplicationSettingsMap) error {
|
func (dcp *DcpNozzle) Stop() error {
|
dcp.Logger().Infof("%v is stopping...\n", dcp.Id())
|
err := dcp.SetState(common.Part_Stopping)
|
+ fmt.Printf("NEIL DEBUG stopping called\n")
|
if err != nil {
|
+ fmt.Printf("NEIL DEBUG stopping called hit err %v\n", err)
|
return err
|
}
|
|
@@ -970,6 +975,7 @@ func (dcp *DcpNozzle) closeUprFeed() error {
|
dcp.uprFeed.Close()
|
dcp.uprFeed = nil
|
} else {
|
+ fmt.Printf("NEIL DEBUG uprfeed did NOT close correctly\n")
|
dcp.Logger().Infof("%v uprfeed is already closed. No-op", dcp.Id())
|
}
|
|
diff --git a/pipeline_manager/pipeline_manager.go b/pipeline_manager/pipeline_manager.go
|
index c7301b4b..b8b65f73 100644
|
--- a/pipeline_manager/pipeline_manager.go
|
+++ b/pipeline_manager/pipeline_manager.go
|
@@ -2303,6 +2303,9 @@ func (r *PipelineUpdater) getFutureRefreshDuration() time.Duration {
|
return r.retry_interval
|
}
|
|
+ // Hard-code and do not back off
|
+ return r.retry_interval
|
+
|
I was able to create a replication, and let it fail and restart repeatedly. After an hour, I got:
26 @ 0x1000400ce 0x100050445 0x100537cc7 0x100073421
|
# 0x100537cc6 github.com/couchbase/goxdcr/component.(*AsyncComponentEventListenerImpl).run+0x86 /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/component/async_listener.go:68
|
|
20 @ 0x1000400ce 0x100050445 0x10033c4cd 0x100073421
|
# 0x10033c4cc github.com/couchbase/gomemcached/client.(*UprFeed).sendCommands+0xac /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/gomemcached/client/upr_feed.go:346
|
Indicating leaking gomemcached feed that isn’t closed.
Each leaked instance corresponds with the message that it “did NOT close correctly” in the logs.
Each gomemcached feed that isn’t closed gets created here: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#522
And then it actually gets started: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#573
DCP Nozzle would actually issue StreamReq and fill up the buffer, before finally hitting an error and exit here: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#815
However, the gomemcached buffer would have been filled up and ended up leaking memory and never stop.
Attachments
Issue Links
- is a backport of
-
MB-59320 XDCR - DCP nozzle race would leave gomemcached feed running and leak memory
- Closed
- is cloned by
-
MB-59446 [BP 7.2.4] - XDCR - Xmem nozzle cleanup is stuck due to waiting on non-existent bandwidth throttler
- Closed
- relates to
-
MB-59340 [BP 7.1.6] - XDCR - DCP nozzle race would leave gomemcached feed running and leak memory
- Resolved