A friend of mine, Lucas Draescher, submitted his first patch to PostgreSQL back
in March. It fixes a file descriptor leak when using io_method=io_uring. The
patch is clean, well-motivated, and on its third revision. It has been sitting
in the commitfest for two months with zero reviewers.
This is not unusual. This is the norm.
The PostgreSQL project runs on volunteer time. Committers are brilliant, but there are not many of them and the patch queue is long. If you submitted a patch and you are waiting, the most useful thing you can do is not wait. It is review someone else’s patch.
I am going to walk through Lucas’s patch step by step and share my findings. Not as a model review, but as a real one.
The unwritten rules π
There are two things nobody writes in the contributing guide, but everyone knows.
The first one: if you want your patch reviewed, review other patches. The community is not transactional about it, but the social dynamic is real. People who show up, read code, and give feedback are people the community remembers.
The second one: committing a patch is not a rubber stamp. When a committer merges your change into PostgreSQL, they own it. If it breaks something in three years, they are the ones who get pinged. That is why reviews matter so much. A thorough review reduces the risk a committer is taking. The better your review, the more useful you are to the person who has to make the final call.
What you need before you start π
This post assumes you can already build PostgreSQL from source. If you cannot, the official documentation covers it. Building from source is a prerequisite, not something we will cover here.
Step 1: find a patch and assign yourself π
Go to the current commitfest and look for patches marked “Needs review”. Pick one that interests you. Size and complexity are not disqualifiers. Read the patch. If you can understand what problem it is solving, you can review it.
Lucas’s entry is here: Fix a file descriptor leak when using io_method=io_uring. One file, 42 lines added, zero deleted. The commit message is clear, the mailing list thread has a reproduction case, and the author already responded to a design question from a previous reviewer. There is enough context to work with.
Once you have picked a patch, assign yourself as a reviewer in the commitfest application. This tells other people you are on it and avoids duplicate work.
Step 2: does the bug actually exist? π
Before you look at the patch, verify the bug exists on current master. I call this the Dr House syndrome. Users always lie. I picked that up working in support. It is never malicious, but patches sit in the commitfest for months. The bug may have been fixed independently, the reproduction steps may be incomplete, or the code may have changed since the original report.
You need a baseline. Build master without the patch, set up the conditions, and try to reproduce the bug yourself.
Lucas’s patch was tested on Ubuntu and Fedora. I am going to reproduce it on Debian, which was not in the original report.
Before running the reproduction steps, make sure io_uring is actually
active. Add this to postgresql.conf and restart:
io_method = io_uring
Then confirm it with show io_method; in psql. If it shows worker, you
are not testing the right code path.
Here is what Lucas says to do:
# Check open io_uring file descriptors
ls -la "/proc/$(head -1 $PGDATA/postmaster.pid)/fd/" | grep "io_uring" | wc -l
# Kill a backend to trigger a server restart
kill -9 $(psql -XtA -U postgres -c "SELECT pid FROM pg_stat_activity WHERE backend_type = 'client backend' LIMIT 1")
# Check again
ls -la "/proc/$(head -1 $PGDATA/postmaster.pid)/fd/" | grep "io_uring" | wc -l
Expected: the count stays the same. Actual (if the bug is present): it doubles.
I reproduced it on Debian, kernel 6.17.13, liburing 2.14, which was not in Lucas’s original test matrix. The count starts at 142 file descriptors. After killing a backend and waiting for the postmaster to restart, it jumps to 284. Exactly double. The bug is real on current master.
Step 3: apply the patch and build π
Get the latest PostgreSQL development tree and create a fresh branch.
git checkout master
git pull --rebase
git checkout -b review-lucas-io-uring
Download the latest .patch file from the commitfest entry. For Lucas’s patch,
that is v3-0001-Release-io_uring-resources-on-shmem-exit.patch. The CFBot
currently shows no rebase needed, which means the patch applies cleanly on top
of master. That is a good sign: the author is keeping it up to date. Apply it.
patch -p1 < v3-0001-Release-io_uring-resources-on-shmem-exit.patch
Then recompile.
make clean && make -j$(nproc)
The patch applied cleanly with no warnings. The build completed without errors.
Step 4: run the regression tests π
Before evaluating anything, run the existing test suite.
make check
If the patch touches more than one subsystem, run the full suite.
make check-world
All existing tests must pass. A patch that breaks existing behavior is not ready, no matter how good the new code looks. I have a post on regression tests if you want to understand what you are looking at.
All 245 tests passed with make check. make check-world also passed.
TAP tests require --enable-tap-tests at configure time and the
libipc-run-perl package on Debian. With those in place, all 39 TAP tests
passed as well.
Step 5: verify the bug is fixed π
Now reproduce the same steps from step 0, this time with the patch applied. The file descriptor count should stay stable across server restarts.
After applying the patch, run make install and restart the server before
testing. Then run the same sequence as in step 2.
The file descriptor count stays at 142 before and after killing a backend. The bug is fixed.
Step 6: evaluate the patch π
This is the actual work. Look at it from three angles.
Code review. Lucas’s fix registers an on_shmem_exit() callback in
pgaio_uring_shmem_init() to release the io_uring resources when the
postmaster exits. The approach is modelled on how method_worker.c handles the
same problem.
The code is clean and follows PostgreSQL conventions. The null check in
AioShmemInit() before registering the callback is correct: only io_uring
sets shmem_cleanup, so the worker method is unaffected. The wrapper
pgaio_shmem_cleanup() correctly satisfies the on_shmem_exit() signature,
and its comment explains clearly why no additional null check is needed inside
it.
In pgaio_uring_shmem_cleanup(), the guard against pgaio_uring_contexts == NULL before looping is good defensive coding. Setting it to NULL after
cleanup prevents a double-free if the callback were somehow called twice. The
DEBUG1 log line is a nice touch for operators.
on_shmem_exit() callbacks are called in LIFO order. Since this callback is
registered at the end of AioShmemInit(), it fires before anything registered
earlier, which is the safe direction: io_uring resources are released before
the shared memory itself gets detached. This does not require a new patch
version. It is worth the committer confirming that no earlier-registered
callback could invalidate pgaio_uring_contexts before this one runs, but
that is a question for review, not a blocker.
Design question. This is where it gets interesting. ChangAo Chen, who was
the first to review the patch, raised a question: should the cleanup be a
direct on_shmem_exit() call as in v1, or should it go through a new
shmem_cleanup callback added to the IoMethodOps struct as in v2? Lucas
answered honestly: he did not know which was better either. The v2 approach is
more structured, but the new callback currently has only one implementor, which
makes the added abstraction harder to justify.
History suggests v2 is the right call. The AIO subsystem is not the first time PostgreSQL introduced an ops struct with a single implementor. Table AM went through exactly the same situation. The pluggable storage patch was first proposed by Alvaro Herrera in 2016 and went through the commitfest for two and a half years before landing in PostgreSQL 12. Andres Freund, who is also the main driver behind the AIO subsystem, joined the Table AM effort in 2018 and rewrote significant parts of it. His stated reason was that the existing version was leaking implementation details through the interface: buffer pinning logic that lived outside the AM, heap-specific types exposed in the generic API. He wrote at the time that “generic code shouldn’t know about the locking scheme a particular AM needs” (his words, in the mailing list thread). The fix was to move that knowledge inside the interface, even though heap was still the only real implementor.
That is the same principle at stake in Lucas’s patch. The question is whether
AioShmemInit() should know that io_uring needs cleanup, or whether the
method declares it through the interface and the framework handles it blindly.
Based on Andres’s track record, he would almost certainly prefer v2. The
abstraction exists not because there are multiple implementors today, but
because the interface should not need to change when a second one appears
tomorrow.
That exchange is not a failure of the patch. It is the review process working as it should. If you have a view on this, say so in your review.
Performance. This fix only runs on server exit, so there is no performance concern for the normal execution path.
Documentation. This is a bug fix with no user-visible behavior change. No doc update needed.
The code is clean and follows PostgreSQL conventions. No issues found.
Step 7: check the tests π
Lucas’s patch does not include new regression tests. For a resource cleanup bug that only triggers on server restart after a backend is killed, writing a regression test is genuinely hard. TAP tests can handle this kind of scenario, but it is worth noting in your review whether you think a test is feasible or not.
Asking for a TAP test here is a stretch. Reliably counting file descriptors from inside a test harness after a SIGKILL is fragile. The reproduction steps in the original email are clear and serve as a manual test. That is enough for a bug fix of this scope.
Step 8: write your review π
Your findings are only useful if you post them. Send your feedback to the
relevant thread on pgsql-hackers. Be precise. If something failed, include
the exact commands you ran, your PostgreSQL version, your OS, and the full
error output. If something is wrong with the code, quote the relevant lines and
explain why.
Be constructive. The author put time into this. You are not grading them, you are helping them get it committed.
Then update the patch status in the commitfest application.
- Set it to Waiting on Author if you found problems that need fixing.
- Set it to Ready for Committer if the patch is solid, tested, and complete.
If you are not sure, leave a comment and leave the status as is. A partial review with honest uncertainty is more useful than silence.
Go review a patch π
Start with Lucas’s if you want a concrete target: Fix a file descriptor leak when using io_method=io_uring.
The queue is never empty.