Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19889

Potential operational deadlock (livelock) during heavy load

    XMLWordPrintable

Details

    Description

      On the replica side we accept items from DCP stream only if memory used is below replication_throttle_threshold (99%).

      On a 2 node cluster with 1 replica, we can run into a situation where items are in memory on the active side of DCP each stream and ready to be sent to the replica side. But the replica side would refuse to take in any items because it has reached replication_throttle_threshold. (Note that, memory usage till replication_throttle_threshold is reached due to items in readyQ of DCP which are waiting to be sent to other side. The resident ratio is near 0%, i.e all items are paged out.) This can lead to an operational deadlock when we have active and replica on both nodes (it is so in our case).

      Cursor dropping implemented in MB-9897 handled the deadlock case only when the memory usage was due to items to be sent sitting on the checkpoint. Though it reduces the scope of deadlock, it does not completely solve the problem. We could have the same deadlock due to the items sitting on the readyQ of the active stream and thereby hogging the memory.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-19889
          # Subject Branch Project Status CR V

          Activity

            People

              owend Daniel Owen
              manu Manu Dhundi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty