Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.0.0
-
Security Level: Public
-
Triaged
-
Unknown
Description
failure_restart_interval for all replications is by default 30s. This means that anytime pipeline is broken, xdcr will wait for 30s before fixing it.
XDCR will need another 30 or more secs to fix the broken pipeline.
So for every broken pipeline, we will have >1 min of no replication.
Why should
1. failure_restart_interval be tunable for every replication? Can this be 0 and made an internal setting?
2. not xdcr compute the exponential backoff if there is a second failure instead if waiting for 30s?
3. 30s was default for erlang xdcr(based on the concept of erlang process crash and restarts). Does the same still hold good for goxdcr where the process itself does not crash?
In any case, despite xdcr crashes being very common in erlang xdcr, I have seldom seen zero replication for a minute or more.
Attachments
For Gerrit Dashboard: MB-14228 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
49729,3 | MB-14228 GoXDCR: Should failure_restart_interval still be 30s by default? | master | goxdcr | Status: MERGED | +2 | +1 |