Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
Description
Raising this as a high level idea/suggestion/improvment.
Our current replicator design is built to ensure consistency in the case of channels (or other filters) changing. We do this by hashing these channels (or other filters - I'm mostly going to refer to channels here though) into the checkpoint ID. As such, if the channels change, we expect to start from 0 (or the last time we replicated that set).
This approach has a slight downside in that it makes it a little difficult to do have a "one true replicator" within your app - if you add and remove from a list of channels on an ad-hoc basis, you need to keep rechecking all the documents you've already got.
Taking a particularly bad case - imagine having a channel metadata which contains ~10k docs of metadata that are generally considered to be needed by everyone, and therefore almost everyone replicates this channel. Add to this specific channels for location_N where a user might be replicating a handful which changes frequently ("today, you're assigned to locations 1, 19 ,127..." etc.). In this case, having one replicator with [metadata, location_1, location_19, location_127] will recheck all of metadata even if it only actually needs to pick up 1 change in location_127 - we do of course get the saving that we don't need to pul those documents, but the checking will still be a substantial overhead.
Handful of thoughts on this:
- Sequential Replicators as a 1st/2nd class pattern within CBL.
- This is actually a pattern I've adopted in the past. At an app level, you'd often want to have a concept of "replicated" or "not replicated", and this doesn't fit well with having multiple discrete replicators (MetadataReplicator and LocationsReplicator for example). Instead (or rather, as a refinement of that) it's nice to daisy-chain replicators. This way, you can easily have a MetadataReplicator which is expected to never/very infrequently change channels, and a more dynamic LocationsReplicator which can change channels with minimal overhead.
- Discrete Channel Replications
- Suspect that this would have perf implications, but if CBL was to demux the channels and individually replicate each, it could checkpoint each channel individually as its own replicator.
- Itemised Checkpoints
- Rather than checkpointing for a given set of channels, define the checkpoint based on the usual other params (CBL_UUID, SG_URL, etc) and maintain a checkpoint per channel within that. Obviously, this makes the checkpoints larger, and implies an upper bound of channels you can fit into a checkpoint...
- Encourage channel sets at the user level.
- We already have the functionality for this in Sync Gateway - simply have users normally replicate with */no filter and assign/unassign channels from that user as needed. However, this is costly on the SG side, and limiting in that even if you allow users to self-assign channels, it's effectively a hard filter at the SG level - e.g. I can't go and grab the odd document in another channel, I need to add the channel to my user and effectively grab all of it.
- Combined with any of the above More Flags!
- Not always the best option, but picking any of these options and allowing at as a non-default mode with a ReplicatorConfig feels reasonable. Being able to set FilterOptimise.STATIC vs FilterOptimise.DYNAMIC allows us to keep the same behaviour, but provide the dev with a potential benefit. Somewhat similar to High/Low IO Priority in Server.