Details
Description
Currently if plasma.New() encounters a fatal error like ErrCorruptLSSBlock, it fails the open and returns to indexer, which restarts the indexer process.
Instead, if a recoverable fatal error (like ErrCorruptLSSBlock) is detected in NewLSStore(), auto rollback to the previous recovery checkpoint.
Updating this with discussions, in plasma since there is a continuous incrementally compacted log, rolling back to a prior checkpoint may not ensure that the corrupted page will not be accessed again since some keys in that page may go all the way back to the oldest snapshot.
So for plasma, a corruption can be handled as follows:
- Attempt to record the fact that a corruption was detected in a non-open code path.
- Fail NewLSStore() with the corruption error
Attachments
Issue Links
- relates to
-
MB-28139 Failure in disk layer should not cause all indexes to fail
- Closed
For Gerrit Dashboard: MB-28478 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
91502,2 | MB-28478 Implement fatal error recording | unstable | plasma | Status: MERGED | +2 | +1 |
92465,3 | MB-28478 error: Add IsFatalError() API to check for fatal errors | unstable | plasma | Status: MERGED | +2 | +1 |
92480,3 | MB-28478 plasma: Return errStorageCorrupted if slice database is corrupted | unstable | indexing | Status: MERGED | +2 | +1 |