Description
We have recently seen cases where Chronicle replicated log files have been corrupted by the action of an external entity. In situations where ns_server restarts for some reasons this has caused the node to become unavailable. The remedy in these cases was pretty straightforward: delete the older Chronicle log file. (Chronicle generally keeps 2 files around).
I suggest a change where if it's the case that the most recent log file can be opened, is valid and contains a valid snapshot then checking consistency issues in older log files can be treated as advisory - that is to say we can log a warning instead of failing to start. If this is not true then we need to proceed backwards through the log files until these conditions are satisfied - and we need to check consistency along the way.
Attachments
Issue Links
- is blocked by
-
MB-58628 Update chronicle SHA to fix failure to start if Chronicle can find a valid snapshot
- Closed