This week, PGSQL Phriday #2 is focused on backups, thanks to Andreas Scherbaum’s blog post.
It is not a secret that I’m all in favor of using pgbackrest. I think that it is today the best backup tool for PostgreSQL. It does not mean others are bad, it means they are less good. If you feel the need to have a way to restore your data, I guess you’d rather use the best tool to achieve that goal.
What’s the problem with pg_basebackup
? 🔗
pg_basebackup
is a tool that should be renamed pg_clonecluster
,
because that’s exactly what it does. That’s why I’m in favor of using
pg_basebackup
to create (or reinstate) a standby.
A proper backup tool should care about how to restore your data. Having a backup is just a convenient way to perform a restore, but that’s just a step in the journey, not a destination.
As pg_basebackup
does not care about restoring, it is missing important
features like:
- restoring (obviously)
- archiving WALs
- maintaining a backup catalog
- managing backups and WALs retention policy
- testing the backup files as not been corrupted (before Postgres 13 and this is not automatic)
I’m sure I could find others, but those should be enough not to consider using
pg_basebackup
as a backup tool.
Barman
can help 🔗
Barman
was created to try to add the features missing in pg_basebackup
, like
the missing catalog, the missing WAL archiving problem, and the missing restore
feature… Barman
also has a mode rsync
where it does not rely on
pg_basebackup
anymore.
I still deeply think that when there’s a design flaw, like pg_basebackup
has, re-creating something new from a blank page is the way to go to fix the
problem.
The only way to find out your
backup is corrupted is to use the verify
command which will call
pg_verifybackup
(only available from Postgres 13) which will fail should you
use the rsync
method without launching the generate-manifest
command.
For a long time, this was not even possible!
Also, I think at one point, we need to discuss Barman
documentation. Back in
2017, I tried to use Barman
to back up a Postgres cluster. I wanted to do it by
myself without any help, just by reading the documentation. I found out that the
documentation was not good, with contradictions in different paragraphs and, in
the end, I was not able to back up (or restore) a standalone Postgres cluster.
As said, Barman
is good, but not good enough.
I recommend pgBackRest
🔗
So, why is pgBackRest
my recommended tool? As said, pgBackRest
will try to
save you time. Back in 2017, when I was experimenting with backup tools, I was able
to follow the documentation and create a backup of my test standalone Postgres
cluster and restore it very easily, following the tutorial.
Since then, I created a practical training for my students at Université Lyon I: without
previous knowledge of Postgres or pgBackRest
, they are able in 4 hours to
backup, restore, PITR, and even dig in the WALs to find the timestamp or xID of
a mistake to PITR just before the mistake. It’s that good that when I suggested
they use pgBackRest
to create a replication, I couldn’t find out if they had
stupidly used the tool without understanding anything or if they had completely
understood how setting a replication works (I added questions for the
replication training, don’t worry)!
The feature I love with pgBackRest
is the “delta” feature which will
copy over only the files which changed since the backup. This feature is very
useful for huge clusters. You can first issue a messy rsync command (or restore
a storage snapshot) and then restore with the delta option.
Another great feature of pgBackRest
is the ability to restore just one
database on a dummy cluster so that you can then choose which data you want to
copy over to another cluster.
Of course, those features are only the cherry on the top. pgBackRest
was
created with secure restore in mind, so that the first feature of the first
version of pgBackRest
was the creation of a manifest file and check-summing of all
files.
The only case when I don’t recommend pgBackRest
is when you’re running Postgres
under Windows (which I don’t recommend either), because there is no Windows
version of pgBackRest
for Windows. I still consider having a great
backup/restore tool for Windows is a must for the community and we’re trying to
work on it.
Other tools 🔗
Other tools are barely used in the community. You might want to favor a tool that is used by the majority of people running Postgres so that you reduce the risk to find a bug, but this is your choice.
Bonus question: is pg_dump
a backup tool 🔗
It depends. If you can allow losing all transactions between the beginning of your
dump and the time you need to import your dump, then pg_dump
can be considered
as a backup tool. I’d rather use the term “export” than “backup” so that people
don’t have a false sense of security.
On top of that, you need to consider the time to re-import all those data which can be very long as importing is slower than simply copying files to the file system. This operation needs to be performed frequently to make sure the downtime is affordable.
Most of the time, you won’t be fine with losing that much data or having such
a downtime to re-import your dump, but if you fully understand the consequences
and risks and are ok with them, please use pg_dump
as a backup tool.
Use pgBackRest
and relax 🔗
This is my TL; TD for this post: just use pgBackRest
and you’ll be fine.
Remember also that you should never trust a tool and always try to complete
a restore to make sure your backup is usable. I guess you don’t want to discover
what “Backup of Shrödinger means”!