Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19892

ep-engine: backfills are not always terminated when closing DcpProducer's streams, causing FD leak

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 4.5.1
    • 4.5.0
    • couchbase-bucket
    • None
    • 4.5.0-2601, Windows system test
    • Untriaged
    • Unknown
    • KV: June 12 - July 2

    Description

      Summary

      There is a memory and FD leak if a DCP Producer is closed when backfills are still present - for example if the connection is disconnected while backfill is still running.

      The issue is that there is a circular reference between DcpProducer and its ActiveStreams (in the `streams` map). This means that while
      all /external/ references to DcpProducer are correctly reduced to zero, the refcount is held at 1 by any ActiveStream objects, and
      vice-versa.

      The effect is that the DcpProducer object is never deleted, and in turn we do not close open couchstore files the DCPBackfill tasks have open.

      Details

      Unable to delete bucket database directory RevAB
      {error,eexist}	ns_couchdb_api 000	ns_1@172.23.108.44	9:31:01 AM Thu Jun 9, 2016
      Failed to cleanup old buckets on node 'ns_1@172.23.108.44': {error,eexist}	ns_rebalancer 000	ns_1@172.23.105.87	9:31:01 AM Thu Jun 9, 2016
      Rebalance exited with reason {buckets_cleanup_failed,['ns_1@172.23.108.44']}
      ns_orchestrator 002	ns_1@172.23.105.87	9:31:01 AM Thu Jun 9, 2016
      Deleting old data files of bucket "UserInfo"	ns_storage_conf 000	ns_1@172.23.108.44	9:30:58 AM Thu Jun 9, 2016
      Deleting old data files of bucket "RevAB"	ns_storage_conf 000	ns_1@172.23.108.44	9:30:58 AM Thu Jun 9, 2016
      Deleting old data files of bucket "MsgsCalls"	ns_storage_conf 000	ns_1@172.23.108.44	9:30:58 AM Thu Jun 9, 2016
      Deleting old data files of bucket "AbRegNums"	ns_storage_conf 000	ns_1@172.23.108.44	9:30:58 AM Thu Jun 9, 2016
      Shutting down bucket "AbRegNums" on 'ns_1@172.23.108.44' for deletion	ns_memcached 000	ns_1@172.23.108.44	9:30:21 AM Thu Jun 9, 2016
      Shutting down bucket "MsgsCalls" on 'ns_1@172.23.108.44' for deletion	ns_memcached 000	ns_1@172.23.108.44	9:30:19 AM Thu Jun 9, 2016
      Starting rebalance, KeepNodes = ['ns_1@172.23.105.87','ns_1@172.23.105.94',
                                       'ns_1@172.23.107.85','ns_1@172.23.108.44'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes
      ns_orchestrator 004	ns_1@172.23.105.87	9:29:58 AM Thu Jun 9, 2016
      Shutting down bucket "RevAB" on 'ns_1@172.23.108.44' for deletion	ns_memcached 000	ns_1@172.23.108.44	9:29:54 AM Thu Jun 9, 2016
      Failed over 'ns_1@172.23.108.44': ok	ns_rebalancer 000	ns_1@172.23.105.87	9:29:50 AM Thu Jun 9, 2016
      Shutting down bucket "UserInfo" on 'ns_1@172.23.108.44' for deletion	ns_memcached 000	ns_1@172.23.108.44	9:29:50 AM Thu Jun 9, 2016
      Starting failing over 'ns_1@172.23.108.44'	ns_rebalancer 000	ns_1@172.23.105.87	9:29:46 AM Thu Jun 9, 2016
      

      will provide logs soon. cluster is still alive

      Attachments

        Issue Links

          Activity

            People

              drigby Dave Rigby (Inactive)
              andreibaranouski Andrei Baranouski
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty