Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-25218

Ephemeral: Handle DCP Backfill Dropping

    XMLWordPrintable

Details

    Description

      In Ephemeral buckets, backfills are done from sequence list which is memory. During a backfill, we get a read lock over a range in the list. That is, we do not deduplicate the updated items in that range. (a) This increases the amount of stale items and hence increases the memory usage.
      (b) Also currently we can do only one backfill from the seqlist at a time.

      This creates 2 operational problems
      i) One slow/misbehaving DCP client can slow down or block DCP reads (hence replication) for all other clients.
      ii) We can run into livelock when replicating with heavy front end load under very high memory usage. That is: say we have a 2 node cluster with 1 replica. On one vbucket, we have DCP stream node 1 ==> node 2 and on another vbucket we have DCP stream node 2 ==> node 1.
      If the memory usage of node 2 is high it pushes back on replica items from node 1, hence the backfill on node 1 does not complete and hence holds the read lock over a range in the list. Therefore memory usage of node 1 goes high with stale items and also because we cannot run LRU auto deletion on the list. Now, owing to high memory usage, node 1 will start pushing back on replica items from node 2. This furthers the memory usage in node 2 and we have a cyclic operational deadlock (or live lock).

      Solution to the problems is
      i) Drop slow DCP stream from non replication clients
      ii) Drop replication streams to avoid operational deadlock. This needs co-ordination with the ns-server team as we have had problems before when kv-engine explicitly dropped and reconnected the slow streams.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-25218
          # Subject Branch Project Status CR V

          Activity

            People

              Unassigned Unassigned
              manu Manu Dhundi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty