How to do proper backups

· 985 words · 5 minute read

This week, PGSQL Phriday #2 is focused on backups, thanks to Andreas Scherbaum’s blog post.

It is not a secret that I’m all in favor of using pgbackrest. I think that it is today the best backup tool for PostgreSQL. It does not mean others are bad, it means they are less good. If you feel the need to have a way to restore your data, I guess you’d rather use the best tool to achieve that goal.

What’s the problem with pg_basebackup? 🔗

pg_basebackup is a tool that should be renamed pg_clonecluster, because that’s exactly what it does. That’s why I’m in favor of using pg_basebackup to create (or reinstate) a standby.

A proper backup tool should care about how to restore your data. Having a backup is just a convenient way to perform a restore, but that’s just a step in the journey, not a destination.

As pg_basebackup does not care about restoring, it is missing important features like:

  • restoring (obviously)
  • archiving WALs
  • maintaining a backup catalog
  • managing backups and WALs retention policy
  • testing the backup files as not been corrupted (before Postgres 13 and this is not automatic)

I’m sure I could find others, but those should be enough not to consider using pg_basebackup as a backup tool.

Barman can help 🔗

Barman was created to try to add the features missing in pg_basebackup, like the missing catalog, the missing WAL archiving problem, and the missing restore feature… Barman also has a mode rsync where it does not rely on pg_basebackup anymore.

I still deeply think that when there’s a design flaw, like pg_basebackup has, re-creating something new from a blank page is the way to go to fix the problem.

The only way to find out your backup is corrupted is to use the verify command which will call pg_verifybackup(only available from Postgres 13) which will fail should you use the rsync method without launching the generate-manifest command.

For a long time, this was not even possible!

Also, I think at one point, we need to discuss Barman documentation. Back in 2017, I tried to use Barman to back up a Postgres cluster. I wanted to do it by myself without any help, just by reading the documentation. I found out that the documentation was not good, with contradictions in different paragraphs and, in the end, I was not able to back up (or restore) a standalone Postgres cluster.

As said, Barman is good, but not good enough.

I recommend pgBackRest 🔗

So, why is pgBackRest my recommended tool? As said, pgBackRest will try to save you time. Back in 2017, when I was experimenting with backup tools, I was able to follow the documentation and create a backup of my test standalone Postgres cluster and restore it very easily, following the tutorial.

Since then, I created a practical training for my students at Université Lyon I: without previous knowledge of Postgres or pgBackRest, they are able in 4 hours to backup, restore, PITR, and even dig in the WALs to find the timestamp or xID of a mistake to PITR just before the mistake. It’s that good that when I suggested they use pgBackRest to create a replication, I couldn’t find out if they had stupidly used the tool without understanding anything or if they had completely understood how setting a replication works (I added questions for the replication training, don’t worry)!

The feature I love with pgBackRest is the “delta” feature which will copy over only the files which changed since the backup. This feature is very useful for huge clusters. You can first issue a messy rsync command (or restore a storage snapshot) and then restore with the delta option.

Another great feature of pgBackRest is the ability to restore just one database on a dummy cluster so that you can then choose which data you want to copy over to another cluster.

Of course, those features are only the cherry on the top. pgBackRest was created with secure restore in mind, so that the first feature of the first version of pgBackRest was the creation of a manifest file and check-summing of all files.

The only case when I don’t recommend pgBackRest is when you’re running Postgres under Windows (which I don’t recommend either), because there is no Windows version of pgBackRest for Windows. I still consider having a great backup/restore tool for Windows is a must for the community and we’re trying to work on it.

Other tools 🔗

Other tools are barely used in the community. You might want to favor a tool that is used by the majority of people running Postgres so that you reduce the risk to find a bug, but this is your choice.

Bonus question: is pg_dump a backup tool 🔗

It depends. If you can allow losing all transactions between the beginning of your dump and the time you need to import your dump, then pg_dump can be considered as a backup tool. I’d rather use the term “export” than “backup” so that people don’t have a false sense of security.

On top of that, you need to consider the time to re-import all those data which can be very long as importing is slower than simply copying files to the file system. This operation needs to be performed frequently to make sure the downtime is affordable.

Most of the time, you won’t be fine with losing that much data or having such a downtime to re-import your dump, but if you fully understand the consequences and risks and are ok with them, please use pg_dump as a backup tool.

Use pgBackRest and relax 🔗

This is my TL; TD for this post: just use pgBackRest and you’ll be fine. Remember also that you should never trust a tool and always try to complete a restore to make sure your backup is usable. I guess you don’t want to discover what “Backup of Shrödinger means”!