Details
Description
What's the issue?
- We use group by, and order by clauses in SQLite
- This requires the creation of temporary b-trees (at least in our case)
- These use an OS specific temporary directory by default
- If this runs out of space, the user is unable to restore
- If we pair this with existing bugs (
MB-40395,MB-53112)- We end up with restores that will complete "successfully" but skip restoring some vBuckets
It is also possible to hit this when building an index, see CBSE-14196. Once again sqlite creates a temporary file of a not insignificant size.
Steps to reproduce
- Create a small simulation temporary store
- 'dd if=/dev/zero of=backing.fs bs=1M status=progress conv=fsync count=50'
- 'mkfs.xfs backing.fs'
- 'mount -o loop,rw backing.fs /mnt/'
- 'chown <user>:<group> -R /mnt/'
- Spin up a single node cluster
- Limit the number of vBuckets to one
- 'curl -X POST -u Administrator:asdasd http://localhost:8091/diag/eval -d "ns_config:set(couchbase_num_vbuckets_default, 1)."'
- Create a bucket
- Load a decent amount of data into it (I've tested with ~5GiB)
- Perform a backup
- Perform a restore, but inform SQLite to use the small temporary directory
- 'env SQLITE_TMPDIR=/mnt cbbackupmgr ...'
- Should see one of two errors
- 'Error restoring cluster: cannot commit - no transaction is active' (SQLite)
- 'Error restoring cluster: database or disk is full'
What's the fix?
We've got a couple of potential fixes:
- We employ the workaround in a more permanent fashion (unfortunately this isn't a clean solution)
- We use the SQLite pragma to manually set the SQLite temporary directory to one that resides in our Archive (unfortunately unclean again as we'd be relying on deprecated behavior).
Is there a workaround?
On Linux, the 'SQLITE_TMPDIR' environment variable may be set to a path which has sufficient space to perform the restore.
On Windows, the 'TMP' environment variable may be set to a path which has sufficient space to perform the restore.