Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53226

[CBM] Handle not having enough disk space in the default temporary directory for sqlite indexes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 7.6.0
    • 6.5.1, 6.6.0, 6.6.1, 6.6.2, 6.5.2, 6.5.0, 6.6.3, 6.6.4, 6.6.5, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.0
    • tools
    • Untriaged
    • 1
    • No

    Description

      What's the issue?

      1. We use group by, and order by clauses in SQLite
      2. This requires the creation of temporary b-trees (at least in our case)
      3. These use an OS specific temporary directory by default
      4. If this runs out of space, the user is unable to restore
      5. If we pair this with existing bugs (MB-40395, MB-53112)
        1. We end up with restores that will complete "successfully" but skip restoring some vBuckets

      It is also possible to hit this when building an index, see CBSE-14196. Once again sqlite creates a temporary file of a not insignificant size.

      Steps to reproduce

      1. Create a small simulation temporary store
        1. 'dd if=/dev/zero of=backing.fs bs=1M status=progress conv=fsync count=50'
        2. 'mkfs.xfs backing.fs'
        3. 'mount -o loop,rw backing.fs /mnt/'
        4. 'chown <user>:<group> -R /mnt/'
      2. Spin up a single node cluster
      3. Limit the number of vBuckets to one
        1. 'curl -X POST -u Administrator:asdasd http://localhost:8091/diag/eval -d "ns_config:set(couchbase_num_vbuckets_default, 1)."'
      4. Create a bucket
      5. Load a decent amount of data into it (I've tested with ~5GiB)
      6. Perform a backup
      7. Perform a restore, but inform SQLite to use the small temporary directory
        1. 'env SQLITE_TMPDIR=/mnt cbbackupmgr ...'
      8. Should see one of two errors
        1. 'Error restoring cluster: cannot commit - no transaction is active' (SQLite)
        2. 'Error restoring cluster: database or disk is full'

      What's the fix?
      We've got a couple of potential fixes:

      1. We employ the workaround in a more permanent fashion (unfortunately this isn't a clean solution)
      2. We use the SQLite pragma to manually set the SQLite temporary directory to one that resides in our Archive (unfortunately unclean again as we'd be relying on deprecated behavior).

      Is there a workaround?
      On Linux, the 'SQLITE_TMPDIR' environment variable may be set to a path which has sufficient space to perform the restore.

      On Windows, the 'TMP' environment variable may be set to a path which has sufficient space to perform the restore.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              gilad.kalchheim Gilad Kalchheim
              james.lee James Lee
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty