Details
-
Bug
-
Resolution: Fixed
-
Major
-
4.5.0, 4.5.1
-
Triaged
-
No
Description
On the replica side we accept items from DCP stream only if memory used is below replication_throttle_threshold (99%).
On a 2 node cluster with 1 replica, we can run into a situation where items are in memory on the active side of DCP each stream and ready to be sent to the replica side. But the replica side would refuse to take in any items because it has reached replication_throttle_threshold. (Note that, memory usage till replication_throttle_threshold is reached due to items in readyQ of DCP which are waiting to be sent to other side. The resident ratio is near 0%, i.e all items are paged out.) This can lead to an operational deadlock when we have active and replica on both nodes (it is so in our case).
Cursor dropping implemented in MB-9897 handled the deadlock case only when the memory usage was due to items to be sent sitting on the checkpoint. Though it reduces the scope of deadlock, it does not completely solve the problem. We could have the same deadlock due to the items sitting on the readyQ of the active stream and thereby hogging the memory.