Description
EP-engine is creating a per-bucket counter that will track the number of times the wall clock is short of the max cas value by more than an identified threshold. This will indicate that somewhere in the replication topology the time has skewed
ns_server should alert (using our current alerting mechanism - send e-mail and pop up in the UI) when the following condition is satisfied for any of the buckets in the cluster:
- the counter increases for any bucket
- that bucket is an LWW bucket.
If the situation persists, the counters will continue to increase and we'll continue to alert.
Attachments
Issue Links
For Gerrit Dashboard: MB-21153 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
68913,6 | MB-21153: Alert when |max_cas - wall_clock| exceeds the threshold. | watson | ns_server | Status: MERGED | +2 | +1 |
68914,6 | MB-21153: Modified UI to handle ep_clock_cas_drift_threshold alert. | watson | ns_server | Status: MERGED | +2 | +1 |
69158,1 | Merge remote-tracking branch 'couchbase/watson' | master | ns_server | Status: MERGED | +2 | +1 |
69185,3 | MB-21153: add ep_clock_cas_drift_threshold alert back to app-classic | master | ns_server | Status: MERGED | +2 | +1 |