Pinned fields
Click on the next to a field label to start pinning.
Details
Assignee
James LeeJames LeeReporter
James LeeJames LeeStory Points
1Priority
MajorInstabug
Open Instabug
Details
Details
Assignee
James Lee
James LeeReporter
James Lee
James LeeStory Points
1
Priority
Instabug
Open Instabug
PagerDuty
PagerDuty
PagerDuty
Sentry
Sentry
Sentry
Zendesk Support
Zendesk Support
Zendesk Support
Created April 13, 2021 at 8:51 AM
Updated May 23, 2022 at 8:21 AM
What's the issue?
Although we do have logic to handle unexpected failures such as power outages and a '
kill -9
', it's currently built of unsafe assumptions, these assumptions are:That SQLite will be able to recover from a power outage (by default, this should be the case, however, we disable journals and syncing)
The the '
RiftBufferedWriter
' will be able to recover from a power outage (we use the 'sync_file_range
' syscall as a performance optimization; this isn't safe on some filesystems as file metadata will not be written out as it would with 'fdatasync
').Example
Some of our testing will prematurely '
kill -9
' 'cbbackupmgr
' in an effort to test resume support, this has lead to situations where an invalid/corrupt SQLite file is detected. See this case where one of our tests has failed due to a 'database disk image malformed
'.What's the fix?
Ideally, we should better handle these situations where possible:
Move away from using the truncate-overwrite pattern ()
Enable syncing for SQLite (potentially enable journals, although we need to consider that not all journal types are supported on NFS)
Periodically sync using '
fsync
' or 'fdatasync
' in the 'RiftBufferedWriter
'