Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.2.4, 7.2.3
Affects Version/s: 7.6.0, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.2.0, 7.1.3, 7.2.1, 7.1.5, 7.2.4, 7.2.2, 7.2.3
Component/s: XDCR
Labels:

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
No

Description

Normally, if DCP nozzles are started and then stopped in sequence, there shouldn’t be any UprFeed leaks.
As an example, in a normal pipeline with 2 source nozzles, after successions of pause and resume, we only see

2 @ 0x10004006e 0x1000503e5 0x10033c46d 0x1000733c1

#       0x10033c46c     github.com/couchbase/gomemcached/client.(*UprFeed).sendCommands+0xac    /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/gomemcached/client/upr_feed.go:346

With the following hard-coded hacks to induce race:

diff --git a/parts/dcp_nozzle.go b/parts/dcp_nozzle.go

index f6ecbd0d..096b5b2f 100644

--- a/parts/dcp_nozzle.go

+++ b/parts/dcp_nozzle.go

@@ -747,7 +747,10 @@ func (dcp *DcpNozzle) initialize(settings metadata.ReplicationSettingsMap) (err

                dcp.Logger().Infof("%v with OSO mode requested", dcp.Id())

+       fmt.Printf("NEIL DEBUG sleeping 40 seconds before initializing UPR feed\n")

+       time.Sleep(40 * time.Second)

        err = dcp.initializeUprFeed()

+       fmt.Printf("NEIL DEBUG UPR feed now opened\n")

        if err != nil {

                return err

@@ -845,7 +848,9 @@ func (dcp *DcpNozzle) Start(settings metadata.ReplicationSettingsMap) error {

 func (dcp *DcpNozzle) Stop() error {

        dcp.Logger().Infof("%v is stopping...\n", dcp.Id())

        err := dcp.SetState(common.Part_Stopping)

+       fmt.Printf("NEIL DEBUG stopping called\n")

        if err != nil {

+               fmt.Printf("NEIL DEBUG stopping called hit err %v\n", err)

                return err

@@ -970,6 +975,7 @@ func (dcp *DcpNozzle) closeUprFeed() error {

                dcp.uprFeed.Close()

                dcp.uprFeed = nil

        } else {

+               fmt.Printf("NEIL DEBUG uprfeed did NOT close correctly\n")

                dcp.Logger().Infof("%v uprfeed is already closed. No-op", dcp.Id())

diff --git a/pipeline_manager/pipeline_manager.go b/pipeline_manager/pipeline_manager.go

index c7301b4b..b8b65f73 100644

--- a/pipeline_manager/pipeline_manager.go

+++ b/pipeline_manager/pipeline_manager.go

@@ -2303,6 +2303,9 @@ func (r *PipelineUpdater) getFutureRefreshDuration() time.Duration {

                return r.retry_interval

+       // Hard-code and do not back off

+       return r.retry_interval

I was able to create a replication, and let it fail and restart repeatedly. After an hour, I got:

26 @ 0x1000400ce 0x100050445 0x100537cc7 0x100073421

#       0x100537cc6     github.com/couchbase/goxdcr/component.(*AsyncComponentEventListenerImpl).run+0x86       /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/component/async_listener.go:68

20 @ 0x1000400ce 0x100050445 0x10033c4cd 0x100073421

#       0x10033c4cc     github.com/couchbase/gomemcached/client.(*UprFeed).sendCommands+0xac    /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/gomemcached/client/upr_feed.go:346

Indicating leaking gomemcached feed that isn’t closed.

Each leaked instance corresponds with the message that it “did NOT close correctly” in the logs.

Each gomemcached feed that isn’t closed gets created here: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#522
And then it actually gets started: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#573
DCP Nozzle would actually issue StreamReq and fill up the buffer, before finally hitting an error and exit here: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#815
However, the gomemcached buffer would have been filled up and ended up leaking memory and never stop.

Attachments

Issue Links

is a backport of

MB-59320 XDCR - DCP nozzle race would leave gomemcached feed running and leak memory

Closed

is cloned by

MB-59446 [BP 7.2.4] - XDCR - Xmem nozzle cleanup is stuck due to waiting on non-existent bandwidth throttler

Closed

relates to

MB-59340 [BP 7.1.6] - XDCR - DCP nozzle race would leave gomemcached feed running and leak memory

Resolved

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Ayush Nayyar

Reporter:: Neil Huang

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 27/Oct/23 11:19 AM

Updated:: 03/Jan/24 10:21 AM

Resolved:: 02/Nov/23 4:58 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-59339: ensure uprFeed assignment is synchronized: Gerrit Review:

MB-59339: ensure uprFeed assignment is synchronized: Gerrit Review:

[BP 7.2.3/7.2.4] - XDCR - DCP nozzle race would leave gomemcached feed running and leak memory

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty