The fsync issue and Postgres

Feb 2, 2020 · 475 words · 3 minute read

TLDR: If you use an up-to-date kernel or up-to-date Postgres (last minor version for a supported version), you’ll be safe.

Postgres rely on the OS to write on disk. It’s part of the “we do database only, but we do it well” Postgres’s philosophy. There are two kinds of writes from Postgres point of view: Writing datafile into a disk or writing WALs into a disk. The first kind uses the kernel cache whereas the second kind uses sometimes direct IO and, some other times (when max_wal_sender is set to any value different from 0), writing wal still uses kernel cache.

The fsync problem happens only on the kernel when the OS needs to flush the kernel cache into a disk. Postgres expected the OS to retry flushing data from the cache to disk if an error occurred during the previous fsync call. The reality is that if fsync fails, data are cleared from the cache and the next fsync call won’t retry to flush them to disk (well, it won’t be able to do it anyway as those data have been discarded). The detailed behavior will depend on your filesystem (ext4, xfs, btrfs…) but the result will be the same: fsync won’t retry to flush the data into the disk.

There was another expectation from Postgres regarding fsync: as a file may have several file descriptors, possibly from different processes, Postgres assumed that if fsync failed for one process, the failure would be reported to the other processes. The reality is sometimes no-one gets the error (Linux kernel older than 4.13) and sometimes only the first process gets an error, the others won’t have anything indicating there was a failure during writing. This behavior will depend on your OS kernel version.

Solutions to solve the issue:

1- fix the kernel: will take a long time, specifically because developers don’t agree on what is the correct behavior and POSIX standard can be interpreted several ways. Anyhow, there have been several fixes to prevent the worst-case to happen (no error trigged at all).

2- PANIC Postgres when this happens, so we can recover and ensure no data loss or corruption in that case: the patch is written and integrated into the last versions of Postgres but it needs the Linux kernel version to be 4.13 or newer

Be sure to use an up-to-date Postgres and you’ll be fine because Postgres will stop when such a data corruption is found. For high availability, as the corruption is on OS level, you just need to be able to failover to a standby to keep going with your production load.

Moreover, this case only happens when hardware storage fails and if you’re protecting against some hardware failure with some technology like RAID, for example, this issue can’t happen unless you’re experimenting with more than one failure at a time.

Hacking PostgreSQL