Details
-
Improvement
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
6.5.0
Description
Problem
Currently it is not possible to continue on after a disconnect if the client is behind the purge seqno. This situation is common in flaky network environments, such as WAN, links, and effects the feasibility of collecting a full backup from a large data source.
This issue is expected to be hit when backing-up to cloud.
A possible solution is to keep the file handles open after a disconnect for a short / configurable period of time. Then if the client re-connects with the same DCP stream and details (e.g. end snapshot) the stream can continue where it left of, using the old file handle.
Little more detail on proposed solution - if a stream is in backfilling state (from disk), then add a new DCP control message which the DCP client can negotiate:
backfill_resume - Request that disk backfills which are in-progress when the DCP connection is closed (i.e. due to network disconnect) are not immediately closed by KV-Engine, but instead the associated resources (i.e couchstore file snapshot) are preserved for a limited grace period (e.g. 60s).
If the same DCP client (same name presented) re-connects within the grace period, and presents the same snapshot start / end as it was previously in the middle of (see https://github.com/couchbase/kv_engine/blob/master/docs/dcp/documentation/building-a-simple-client.md#restarting-from-where-you-left-off) then the same couchstore file snapshot is re-used and hence the client can resume from the same state, without being subject to purge seqno advancing.
Implementation Sketch
(From some discussions with Jim Walker a while back, assuming I remember correctly...)
(DCP client name, Vbid) -> (timestamp, KVFileHandle, HighSeqno)
(Note: We cannot re-use the ScanContext (and internal KVStore iterator) as-is, because the client may not have received all the mutations KV-Engine transmitted - i.e. the iterator may be too far advanced. Instead we just re-use the FileHandle (i.e. couchstore file snapshot).