Details
-
Task
-
Resolution: Fixed
-
Critical
-
3.0
-
Security Level: Public
-
None
-
3
-
Mar 9 - Mar 27, KV: May 30 - June 10
Description
When DCP streams are created, once the backfill stage is complete they move into a streaming phase. At this point the DCP stream has a cursor pointing at the point in the checkpoints that they are up to.
If a DCP client end up lagging behind (due to network bandwidth limits, or general slow processing) then it's cursor essentially keeps open checkpoints which we would like to discard (assuming the persistence cursor has finished with them). The effect of this is that we can end up keeping large numbers of checkpoint items in memory as we need to keep them around to stream to the (behind) client. In the worst case this has resulted in KV-engine running out-of-memory. See the linked MBs below.
The proposal to address this is to allow "cursor dropping" - if a client gets too far behind then we drop the cursor, allowing us to free the any checkpoints held up by it.
The initial thought was to actually drop the whole DCP stream - i.e tell the client it had ended / been disconnected and forcing them to reconnect. However this was deemed undesirable from the client's pov. The follow-up / alternative proposal is to instead transition the stream to the "backfilling" stage - this allows the checkpoint cursor to be removed, but the client can stay connected - we essentially "re-backfill" from where they reached up to the new current high sequence number.
Design spec: https://docs.google.com/document/d/15baNgCbG7K_EYWnvBhltFER0RBrVkKlDO-wMomlTq-Y/edit