Backup strategies for SQLite in production

Written in

by

    Introduction

    If you are relying on SQLite as your production database, then it makes sense to have some sort of a backup strategy. Those backups would come handy in a multitude of situations. For example, if you face a catastrophic hardware failure and lose all the data on your server. Or if you face a catastrophic administrative failure (e.g. you type delete from <table>; in the console and hit enter before typing the where condition) and lose important data all of a sudden. Or a similar bug in your code. Or wrongfully overriding the database file in your deployment script. For these situations, and others, having a backup would come in handy, as you will hopefully be able to restore the database to a known good state and not lose much data in the process. In this blog, we discuss multiple options you can use for creating those backups for your SQLite database.

    Q: Should backup be stored locally or remotely?

    There are multiple reasons you would want to have remote backups, since if your hardware fails, you might lose the backup data along with the original database file. And even if that’s not the case, it might take considerable time to bring that drive back online after the server has failed and that could greatly disrupt your application.

    That said, local backups do serve a purpose and they can make recovering from admin or programming mistakes very quick and less painful. Moreover, with the right setup, local backups can have the resilience of remote ones. They actually become remote backups as we will see in a bit. This is generally my favorite approach as it pushes the complexity down to the infrastructure layer and keeps my applications simpler to manage and reason about.

    A (not so) quick infrastructure primer

    Note: this section deals a bit with server infrastructure and doesn’t have much to do with SQLite itself, skip it to the next section if you are not interested in this particular discussion.

    One of the best things about computing is that we are very good at building abstractions. Those abstractions mean that we are able to hide complexity and implementation details and deliver a consistent interface to other layers that rely on these abstractions. One very good example is block storage.

    A distributed system in a disk drive

    Almost all storage systems can be represented as sequences of data blocks that are called block devices. These block devices can then be formatted by different filesystems so the OS and applications can use the POSIX interface to write, read, copy and delete files, directories or even portions of files on the disk drive.

    Generally, block devices physically reside on the same machine that runs the filesystem. The OS would connect to the device over a PCI or NVMe connection or similar. But that’s not the only options, block devices can represent storage devices residing elsewhere on the network. And with awesome technologies like Ceph clusters, that can be an interface for networked, replicated, distributed, self managed and self healing storage systems. This is exactly what solutions like EBS (Elastic Block Storage) from AWS and other similar solutions from other providers are using under the hood.

    This means that these distributed data stores are exposed to you locally as block devices. They can then be formatted by any local filesystem of your choice, and they will behave like a local storage device. Except they will be somewhat slower and a lot more durable (triple replicated over the network is the normal level of redundancy for these).

    As we further discuss backup strategies, please remember, that having a remote volume mounted as a primary or a secondary disk in your system means you have filesystem access to a distributed data store that you use by simply doing file operations (e.g. cat, tail, cp, touch, mkdir, etc.).

    Not just durability but potentially also high availability

    With the backups and/or the original database files residing on replicated, remote block storage, if the server hosting that storage fails, it is possible to detach the remote storage from that server and attach it to a new, healthy one relatively quickly, usually under a minute. It is worth noting that this is exactly how some cloud providers deliver high availability (HA) for their managed database services.

    The cherry on top, CoW filesystems/operations

    As we mentioned, the block device can be formatted using any filesystem you desire. But there are certain filesystems that come with features that would make taking backup potentially a lot more efficient. I won’t get into much detail at this point as we will touch on these later, but I would simply mention that features that exist in ZFS, Btrfs or XFS (not a CoW fs but has some CoW features) can help a lot to optimize the backup process as we will see later.

    As an added bonus, both ZFS and Btrfs offer transparent filesystem compression, meaning even the stored, deduplicated pages can be further reduced in size.

    Enough with the infra talk, back to SQLite.

    Multiple options for things to backup

    When backing up a database like SQLite, there are multiple options for what exactly to backup.

    Backup data pages in the file verbatim

    In this mode, you copy each page of the source database files (or the changed pages only if you are doing incremental backups) to the destination. This eventually creates an exact replica of the original database file at a certain point in time.

    Pack data pages as you copy them

    Doing a byte by byte copy of existing database pages means you also copy along space inefficiencies resulting from prior data updates/deletions or suboptimal insertion order for data. By not staying true to the original page layout and packing the data in a more optimized format you end up with a backup that holds the same data as the original but hopefully uses less space. This operation requires more CPU power over just copying though.

    Dump the data as SQL commands

    In this mode you don’t copy the actual data, but create the SQL commands that would generate this data. Later you can create your backup copy by running those statements and reconstructing your database. This mode usually results in a considerably larger backup file as the SQL commands’ text adds considerable overhead per row of data. Though it can be very useful if you want to have the option to restore your database in a different system, say PostgreSQL.

    Multiple options for how to backup

    We talked about what things you could backup, now we talk about how to perform these backups.

    As we discussed in the infra section, we will touch on the options of having either your backups or both your database and your backups residing on a remote replicated storage that is being used as a local filesystem.

    Here are some of the options you can use to backup your database files

    Litestream

    Litestream is an interesting tool that basically does the following:

    • It creates an initial copy of the database file
    • The copy is shipped to S3 or an S3 compatible store
    • It listens to WAL file changes
    • Once a change is detected, new WAL pages are copied. This can happen every second
    • The copy is shipped to S3 or the S3 compatible store
    • You can later query the different versions on the remote object storage and restore any of them locally

    This way, Litestream relies on cloning data pages as they show up in the database WAL file.

    Pros

    • Straight forward to use, there is even a gem that hides most of the complexity if you are using Ruby.
    • Efficient storage wise, as only new pages written are shipped to remote storage
    • Delivers point-in-time recovery and resilience to not just machine loss, but even complete datacenter loss.

    Cons

    • Another piece of software to run and monitor.
    • Can be a bit complicated to use.
    • Potentially not compatible with the more advanced WAL2 mode.
    • Requires subscribing to an S3 like object storage.
    • Restoring might require downloading a large base file and multiple increments.

    SQLite’s backup tool

    SQLite has a backup API, which is also compiled into the sqlite3 CLI in the form of the .backup command. This creates an exact page by page replica of the database file at the point of invoking the command. The database can be written to by other connections during the backup process, but new writes will not reflect in the resulting backup.

    If the target folder resides on a remote storage, then you are effectively creating a remote backup. But furthermore, if the target filesystem is ZFS or Btrfs, then you can use deduplication to ensure that any shared data between backups are stored only once on the filesystem. And if both the source and the backups reside on the same remote storage, then the main database will be included in the deduplication process, meaning very little space is used for the backups.

    Pros

    • The backup API is rock solid (a SQLite component).
    • With remote storage, you transparently get remote replication.
    • With deduplication, backups are incremental even if they appear to be complete copies.
    • Much faster to restore a backup.

    Cons

    • Doesn’t help with the whole datacenter going down (the paranoia is high on this one!).
    • Setting up deduplication can be some effort, especially if you want to go fancy and use something like dm-vdo.
    • The backups can compete for space with your original database files.

    SQLite’s VACUUM INTO

    SQLite has a VACUUM command, that is used to optimize the structure of the database and shrink its size. There is a variant called VACUUM INTO that specifies a target for this optimized version of the database, rather than doing it in place. This process doesn’t interfere with other writes, but it requires more CPU cycles due to the optimization process

    If the target for the vacuum process is on a remote drive, then you will be creating this as a remote backup. And if the target uses deduplication then it will not duplicate any shared pages between backups. There isn’t a great benefit here by having the original database on the same remote drive, as most pages will changes during the VACUUM INTO process, negating any benefit from deduplication with the original database file

    Pros

    • Resulting backup is usually smaller and more efficient than the original database file.
    • Can be stored on a remote storage, delivering replicated durability.
    • Very fast to restore.

    Cons

    • Can use a lot more space if not deduplicated (setting up deduplication is some effort).
    • Cannot benefit from deduplication with the original database file.
    • Competes for space with the original database file if they reside on the same drive.

    SQLite CLI .dump command

    Using the sqlite3 CLI .dump command creates a text file, that contains the SQL commands that can be used to reconstruct the whole database. Both to create the schema and to insert the data into it.

    Pros

    • Resulting text files compress well.
    • The database can be reconstructed using another RDBMS, not just SQLite.
    • If it is written to a remote storage, you create a remote backup.

    Cons

    • Uses a lot more space than other back methods.
    • Deduplication is less effective than other methods.
    • Restoring is the slowest of all mentioned methods.

    Good old cp

    This is my personal favorite as a Btrfs user, here’s why:

    cp is the Linux/Posix copy command, instructing the filesystem to create a copy of the file. When the copy is a page by page replica of the original file it takes up exactly as much space as the original.

    One important thing to note is that you need to copy both the main database and the WAL file(s). In order to do so cleanly you will need to also start a deferred transaction before issuing the cp command. This will allow other writers to proceed, but will prevent the deletion of the WAL file during the backup process.

    db.transaction(:deferred) do
     `cp <src> <dist>`
     `cp <src>-wal <dist>-wal'
    end
    

    As with the other approaches, writing that to a remote storage ensures replicated durability. But the goodness doesn’t stop here. If both the main database and the backup reside on the same remote storage, then you can do shallow copies. On Btrfs or XFS (and recently on ZFS) you can issue the cp --reflink=always variant of cp, which instructs the file system to create a copy-on-write version of the file. Meaning, no blocks will be copied unless, they get overwritten or deleted by the original database file. No deduplication setup is needed in this case as only the changed pages are ever written to the filesystem.

    db.transaction(:deferred) do
     `cp --reflink=always <src> <dist>`
     `cp --reflink=always <src>-wal <dist>-wal'
    end
    

    On a VM with attached block storage hosting both the source and destination databases this operation takes ~2ms to complete for a 440MB database file. The same ~2ms are required to copy a 4.4GB database file as well.

    Before creating the copy of the 4.7GB file, this was the result of running df -h.

    #df -h
    Filesystem                       Size  Used Avail Use% Mounted on
    /dev/mapper/vg01-lv_data_remote   99G  4.9G   93G   5% /mnt/remote
    

    After the copy, the results were as follows:

    #df -h
    Filesystem                       Size  Used Avail Use% Mounted on
    /dev/mapper/vg01-lv_data_remote   99G  4.9G   93G   5% /mnt/remote
    

    No extra space was used (actually a tiny bit of space for the file metadata).

    With this method I can aggressively create multiple copies per second of my database file (if I detect data changes) that I can delete later on, if they end up not being needed, minimizing the potential of losing data due to a mistake or a bug to a few milliseconds.

    It is important to note though, that if you copy this backup file to another remote or local drive then you will be copying the whole 4.4GBs.

    Pros

    • cp is tried and true.
    • With CoW, the copy process is extremely fast, since no pages are ever checked for deduplication.
    • With CoW + remote block storage you get very fast remote backups.
    • Restore is very fast too.

    Cons

    • Needs the database and the backup to reside on the same remote drive for the best results.
    • Requires the use of a specific type of filesystem.
    • Backups compete with the original database file for drive space (but they already consume too little).

    Summary

    There are multiple solid solutions for backing up your SQLite database in production. We have reviewed a couple of them and here’s a summary

    MethodDurabilitySpace efficiencyRestore PerformanceComplexity
    LitestreamVery HighHigh (only changes are shipped) but needs to send base file once in a whileSlowish, requires downloading the base file and WAL snapshot from S3Easy to setup, requires setting up S3 and ensuring it is still up
    .backupHigh if remote targetVery High if target supports deduplicationVery fast if on the same storage (local or remote)Very easy to use, needs remote storage for durability. Deduplication setup can be complex
    VACUUM INTOHigh if remote targetHigh if target supports deduplication Very fast if on the same storage (local or remote)Very easy to use, needs remote storage for durability. Deduplication setup can be complex
    .dumpHigh if remote targetConsumes spaceSlowest, need to download the SQL dump and reconstruct the databaseVery easy to use, needs remote storage for durability.
    cp
    (--reflink=always)
    High if remote targetVery high if source and target on the same filesystem with shallow copy supportVery fast if on the same storage (local or remote)Very easy to use, needs remote storage for durability. Needs using XFS or Btrfs for space efficiency
    Summary of SQLite backup options

    Conclusion

    These are some of the most common ways to manage backups for SQLite in production. Personally, if I have full control on my machine and I have access to remote block storage, I always opt for Btrfs with cp --reflink=always, but if such a facility is not an option and you have access to object storage instead, then Litestream would be a very sensible solution.

    Go ahead and use SQLite in production, with the right backup strategy, you will ensure your data safety with a lot less complexity overall compared to other RDMBS solutions.

    7 responses to “Backup strategies for SQLite in production”

    1. Eduards Sizovs Avatar

      Great article. Correction: Litestream can replicate to the filesystem. S3 is optional.
      https://litestream.io/reference/config/

    2. aarondfrancis Avatar
      aarondfrancis

      Does begin deferred not run the risk of someone writing to the WAL file while you’re copying it? Wouldn’t you need to do begin immediate to ensure that the WAL file is pristine?

    3. aarondfrancis Avatar
      aarondfrancis

      Does begin deferred not run the risk of someone writing to the WAL file while you’re copying it? Wouldn’t you need to do begin immediate to ensure that the WAL file is pristine?

      1. oldmoe Avatar

        Not, really, being deferred ensures the WAL file will not be truncated while you are reading from it. But you will miss any new data changes while you are reading. You will be basically reading a snapshot

    4. Admin Avatar
      Admin

      Reminder: If you’re using sqlite as production database and you’re allowing multiple users to add data, remember to activate WAL mode to avoid database locked errors.

      I’ve used sqlite as production database on onemilliondollar.link and failed multiple times until I discovered that.

    5. Eduards Sizovs Avatar
      Eduards Sizovs

      Also, if you use ZFS or Btrfs – you can use built-in snapshotting capabilities, and don’t need to start a deferred transaction.

    Leave a reply to aarondfrancis Cancel reply