See reopening of
We've recently changed replication supervisor's childs to have type temporary. And that means supervisor will not try to restart failed childs.
But when some vbuckets on source are not yet ready to be replicated from (see later when this happens) we deal with that by 'crashing' replicator after 30 seconds expecting us to be restarted and to deal with new ready set of vbuckets.
It can be seen that there's tiny race in both 1.8.1 style and new-style vbucket filter change logic where vbucket filter change command can be sent to dying ebucketmigrator. So that's another related bug.
When this happens? This is 'typical' for replica count > 2 case even for our 'reliable' replica building attempt. Basically we do replica building in star formation and that means that when vbucket movement is done some replicas even later in chain may be slightly ahead of previous in chain replicas (but ofcourse never ahead of master). If that 'being ahead of' actually means later checkpoint id, that will cause backfill into that later replica, which will mean that this replica will be with open checkpoint 0 for some time. Condition where replication from is not possible. So that's it, that's where we cannot replicate some subset of vbuckets and have to restart ourselves later.