Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59339

[BP 7.2.3/7.2.4] - XDCR - DCP nozzle race would leave gomemcached feed running and leak memory

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • No

    Description

      Normally, if DCP nozzles are started and then stopped in sequence, there shouldn’t be any UprFeed leaks.
      As an example, in a normal pipeline with 2 source nozzles, after successions of pause and resume, we only see

      2 @ 0x10004006e 0x1000503e5 0x10033c46d 0x1000733c1
      #       0x10033c46c     github.com/couchbase/gomemcached/client.(*UprFeed).sendCommands+0xac    /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/gomemcached/client/upr_feed.go:346
      

      With the following hard-coded hacks to induce race:

      diff --git a/parts/dcp_nozzle.go b/parts/dcp_nozzle.go
      index f6ecbd0d..096b5b2f 100644
      --- a/parts/dcp_nozzle.go
      +++ b/parts/dcp_nozzle.go
      @@ -747,7 +747,10 @@ func (dcp *DcpNozzle) initialize(settings metadata.ReplicationSettingsMap) (err
                      dcp.Logger().Infof("%v with OSO mode requested", dcp.Id())
              }
       
      +       fmt.Printf("NEIL DEBUG sleeping 40 seconds before initializing UPR feed\n")
      +       time.Sleep(40 * time.Second)
              err = dcp.initializeUprFeed()
      +       fmt.Printf("NEIL DEBUG UPR feed now opened\n")
              if err != nil {
                      return err
              }
      @@ -845,7 +848,9 @@ func (dcp *DcpNozzle) Start(settings metadata.ReplicationSettingsMap) error {
       func (dcp *DcpNozzle) Stop() error {
              dcp.Logger().Infof("%v is stopping...\n", dcp.Id())
              err := dcp.SetState(common.Part_Stopping)
      +       fmt.Printf("NEIL DEBUG stopping called\n")
              if err != nil {
      +               fmt.Printf("NEIL DEBUG stopping called hit err %v\n", err)
                      return err
              }
       
      @@ -970,6 +975,7 @@ func (dcp *DcpNozzle) closeUprFeed() error {
                      dcp.uprFeed.Close()
                      dcp.uprFeed = nil
              } else {
      +               fmt.Printf("NEIL DEBUG uprfeed did NOT close correctly\n")
                      dcp.Logger().Infof("%v uprfeed is already closed. No-op", dcp.Id())
              }
       
      diff --git a/pipeline_manager/pipeline_manager.go b/pipeline_manager/pipeline_manager.go
      index c7301b4b..b8b65f73 100644
      --- a/pipeline_manager/pipeline_manager.go
      +++ b/pipeline_manager/pipeline_manager.go
      @@ -2303,6 +2303,9 @@ func (r *PipelineUpdater) getFutureRefreshDuration() time.Duration {
                      return r.retry_interval
              }
       
      +       // Hard-code and do not back off
      +       return r.retry_interval
      +
      

      I was able to create a replication, and let it fail and restart repeatedly. After an hour, I got:

      26 @ 0x1000400ce 0x100050445 0x100537cc7 0x100073421
      #       0x100537cc6     github.com/couchbase/goxdcr/component.(*AsyncComponentEventListenerImpl).run+0x86       /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/component/async_listener.go:68
       
      20 @ 0x1000400ce 0x100050445 0x10033c4cd 0x100073421
      #       0x10033c4cc     github.com/couchbase/gomemcached/client.(*UprFeed).sendCommands+0xac    /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/gomemcached/client/upr_feed.go:346
      

      Indicating leaking gomemcached feed that isn’t closed.

      Each leaked instance corresponds with the message that it “did NOT close correctly” in the logs.

      Each gomemcached feed that isn’t closed gets created here: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#522
      And then it actually gets started: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#573
      DCP Nozzle would actually issue StreamReq and fill up the buffer, before finally hitting an error and exit here: https://src.couchbase.org/source/xref/7.1.4/goproj/src/github.com/couchbase/goxdcr/parts/dcp_nozzle.go#815
      However, the gomemcached buffer would have been filled up and ended up leaking memory and never stop.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ayush.nayyar Ayush Nayyar
              neil.huang Neil Huang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty