Fixed
Pinned fields
Click on the next to a field label to start pinning.
Details
Assignee
Ayush NayyarAyush NayyarReporter
Neil HuangNeil HuangIs this a Regression?
YesTriage
UntriagedIssue Impact
externalStory Points
0Priority
CriticalInstabug
Open Instabug
Details
Details
Assignee
Ayush Nayyar
Ayush NayyarReporter
Neil Huang
Neil HuangIs this a Regression?
Yes
Triage
Untriaged
Issue Impact
external
Story Points
0
Priority
Instabug
Open Instabug
PagerDuty
PagerDuty
PagerDuty
Sentry
Sentry
Sentry
Zendesk Support
Zendesk Support
Zendesk Support
Created October 27, 2023 at 6:19 PM
Updated March 21, 2025 at 2:49 AM
Resolved November 2, 2023 at 11:58 PM
Normally, if DCP nozzles are started and then stopped in sequence, there shouldn’t be any UprFeed leaks.
As an example, in a normal pipeline with 2 source nozzles, after successions of pause and resume, we only see
2 @ 0x10004006e 0x1000503e5 0x10033c46d 0x1000733c1 # 0x10033c46c github.com/couchbase/gomemcached/client.(*UprFeed).sendCommands+0xac /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/gomemcached/client/upr_feed.go:346
With the following hard-coded hacks to induce race:
diff --git a/parts/dcp_nozzle.go b/parts/dcp_nozzle.go index f6ecbd0d..096b5b2f 100644 --- a/parts/dcp_nozzle.go +++ b/parts/dcp_nozzle.go @@ -747,7 +747,10 @@ func (dcp *DcpNozzle) initialize(settings metadata.ReplicationSettingsMap) (err dcp.Logger().Infof("%v with OSO mode requested", dcp.Id()) } + fmt.Printf("NEIL DEBUG sleeping 40 seconds before initializing UPR feed\n") + time.Sleep(40 * time.Second) err = dcp.initializeUprFeed() + fmt.Printf("NEIL DEBUG UPR feed now opened\n") if err != nil { return err } @@ -845,7 +848,9 @@ func (dcp *DcpNozzle) Start(settings metadata.ReplicationSettingsMap) error { func (dcp *DcpNozzle) Stop() error { dcp.Logger().Infof("%v is stopping...\n", dcp.Id()) err := dcp.SetState(common.Part_Stopping) + fmt.Printf("NEIL DEBUG stopping called\n") if err != nil { + fmt.Printf("NEIL DEBUG stopping called hit err %v\n", err) return err } @@ -970,6 +975,7 @@ func (dcp *DcpNozzle) closeUprFeed() error { dcp.uprFeed.Close() dcp.uprFeed = nil } else { + fmt.Printf("NEIL DEBUG uprfeed did NOT close correctly\n") dcp.Logger().Infof("%v uprfeed is already closed. No-op", dcp.Id()) } diff --git a/pipeline_manager/pipeline_manager.go b/pipeline_manager/pipeline_manager.go index c7301b4b..b8b65f73 100644 --- a/pipeline_manager/pipeline_manager.go +++ b/pipeline_manager/pipeline_manager.go @@ -2303,6 +2303,9 @@ func (r *PipelineUpdater) getFutureRefreshDuration() time.Duration { return r.retry_interval } + // Hard-code and do not back off + return r.retry_interval +
I was able to create a replication, and let it fail and restart repeatedly. After an hour, I got:
26 @ 0x1000400ce 0x100050445 0x100537cc7 0x100073421 # 0x100537cc6 github.com/couchbase/goxdcr/component.(*AsyncComponentEventListenerImpl).run+0x86 /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/component/async_listener.go:68 20 @ 0x1000400ce 0x100050445 0x10033c4cd 0x100073421 # 0x10033c4cc github.com/couchbase/gomemcached/client.(*UprFeed).sendCommands+0xac /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/gomemcached/client/upr_feed.go:346
Indicating leaking gomemcached feed that isn’t closed.
Each leaked instance corresponds with the message that it “did NOT close correctly” in the logs.
Each gomemcached feed that isn’t closed gets created here: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#522
And then it actually gets started: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#573
DCP Nozzle would actually issue StreamReq and fill up the buffer, before finally hitting an error and exit here: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#815
However, the gomemcached buffer would have been filled up and ended up leaking memory and never stop.