A Reviewer Was Born

Β· 1845 words Β· 9 minute read

A friend of mine, Lucas Draescher, submitted his first patch to PostgreSQL back in March. It fixes a file descriptor leak when using io_method=io_uring. The patch is clean, well-motivated, and on its third revision. It has been sitting in the commitfest for two months with zero reviewers.

This is not unusual. This is the norm.

The PostgreSQL project runs on volunteer time. Committers are brilliant, but there are not many of them and the patch queue is long. If you submitted a patch and you are waiting, the most useful thing you can do is not wait. It is review someone else’s patch.

I am going to walk through Lucas’s patch step by step and share my findings. Not as a model review, but as a real one.

The unwritten rules πŸ”—

There are two things nobody writes in the contributing guide, but everyone knows.

The first one: if you want your patch reviewed, review other patches. The community is not transactional about it, but the social dynamic is real. People who show up, read code, and give feedback are people the community remembers.

The second one: committing a patch is not a rubber stamp. When a committer merges your change into PostgreSQL, they own it. If it breaks something in three years, they are the ones who get pinged. That is why reviews matter so much. A thorough review reduces the risk a committer is taking. The better your review, the more useful you are to the person who has to make the final call.

What you need before you start πŸ”—

This post assumes you can already build PostgreSQL from source. If you cannot, the official documentation covers it. Building from source is a prerequisite, not something we will cover here.

Step 1: find a patch and assign yourself πŸ”—

Go to the current commitfest and look for patches marked “Needs review”. Pick one that interests you. Size and complexity are not disqualifiers. Read the patch. If you can understand what problem it is solving, you can review it.

Lucas’s entry is here: Fix a file descriptor leak when using io_method=io_uring. One file, 42 lines added, zero deleted. The commit message is clear, the mailing list thread has a reproduction case, and the author already responded to a design question from a previous reviewer. There is enough context to work with.

Once you have picked a patch, assign yourself as a reviewer in the commitfest application. This tells other people you are on it and avoids duplicate work.

Step 2: does the bug actually exist? πŸ”—

Before you look at the patch, verify the bug exists on current master. I call this the Dr House syndrome. Users always lie. I picked that up working in support. It is never malicious, but patches sit in the commitfest for months. The bug may have been fixed independently, the reproduction steps may be incomplete, or the code may have changed since the original report.

You need a baseline. Build master without the patch, set up the conditions, and try to reproduce the bug yourself.

Lucas’s patch was tested on Ubuntu and Fedora. I am going to reproduce it on Debian, which was not in the original report.

Before running the reproduction steps, make sure io_uring is actually active. Add this to postgresql.conf and restart:

io_method = io_uring

Then confirm it with show io_method; in psql. If it shows worker, you are not testing the right code path.

Here is what Lucas says to do:

# Check open io_uring file descriptors
ls -la "/proc/$(head -1 $PGDATA/postmaster.pid)/fd/" | grep "io_uring" | wc -l

# Kill a backend to trigger a server restart
kill -9 $(psql -XtA -U postgres -c "SELECT pid FROM pg_stat_activity WHERE backend_type = 'client backend' LIMIT 1")

# Check again
ls -la "/proc/$(head -1 $PGDATA/postmaster.pid)/fd/" | grep "io_uring" | wc -l

Expected: the count stays the same. Actual (if the bug is present): it doubles.

I reproduced it on Debian, kernel 6.17.13, liburing 2.14, which was not in Lucas’s original test matrix. The count starts at 142 file descriptors. After killing a backend and waiting for the postmaster to restart, it jumps to 284. Exactly double. The bug is real on current master.

Step 3: apply the patch and build πŸ”—

Get the latest PostgreSQL development tree and create a fresh branch.

git checkout master
git pull --rebase
git checkout -b review-lucas-io-uring

Download the latest .patch file from the commitfest entry. For Lucas’s patch, that is v3-0001-Release-io_uring-resources-on-shmem-exit.patch. The CFBot currently shows no rebase needed, which means the patch applies cleanly on top of master. That is a good sign: the author is keeping it up to date. Apply it.

patch -p1 < v3-0001-Release-io_uring-resources-on-shmem-exit.patch

Then recompile.

make clean && make -j$(nproc)

The patch applied cleanly with no warnings. The build completed without errors.

Step 4: run the regression tests πŸ”—

Before evaluating anything, run the existing test suite.

make check

If the patch touches more than one subsystem, run the full suite.

make check-world

All existing tests must pass. A patch that breaks existing behavior is not ready, no matter how good the new code looks. I have a post on regression tests if you want to understand what you are looking at.

All 245 tests passed with make check. make check-world also passed. TAP tests require --enable-tap-tests at configure time and the libipc-run-perl package on Debian. With those in place, all 39 TAP tests passed as well.

Step 5: verify the bug is fixed πŸ”—

Now reproduce the same steps from step 0, this time with the patch applied. The file descriptor count should stay stable across server restarts.

After applying the patch, run make install and restart the server before testing. Then run the same sequence as in step 2.

The file descriptor count stays at 142 before and after killing a backend. The bug is fixed.

Step 6: evaluate the patch πŸ”—

This is the actual work. Look at it from three angles.

Code review. Lucas’s fix registers an on_shmem_exit() callback in pgaio_uring_shmem_init() to release the io_uring resources when the postmaster exits. The approach is modelled on how method_worker.c handles the same problem.

The code is clean and follows PostgreSQL conventions. The null check in AioShmemInit() before registering the callback is correct: only io_uring sets shmem_cleanup, so the worker method is unaffected. The wrapper pgaio_shmem_cleanup() correctly satisfies the on_shmem_exit() signature, and its comment explains clearly why no additional null check is needed inside it.

In pgaio_uring_shmem_cleanup(), the guard against pgaio_uring_contexts == NULL before looping is good defensive coding. Setting it to NULL after cleanup prevents a double-free if the callback were somehow called twice. The DEBUG1 log line is a nice touch for operators.

on_shmem_exit() callbacks are called in LIFO order. Since this callback is registered at the end of AioShmemInit(), it fires before anything registered earlier, which is the safe direction: io_uring resources are released before the shared memory itself gets detached. This does not require a new patch version. It is worth the committer confirming that no earlier-registered callback could invalidate pgaio_uring_contexts before this one runs, but that is a question for review, not a blocker.

Design question. This is where it gets interesting. ChangAo Chen, who was the first to review the patch, raised a question: should the cleanup be a direct on_shmem_exit() call as in v1, or should it go through a new shmem_cleanup callback added to the IoMethodOps struct as in v2? Lucas answered honestly: he did not know which was better either. The v2 approach is more structured, but the new callback currently has only one implementor, which makes the added abstraction harder to justify.

v1 direct call vs v2 callback in IoMethodOps

History suggests v2 is the right call. The AIO subsystem is not the first time PostgreSQL introduced an ops struct with a single implementor. Table AM went through exactly the same situation. The pluggable storage patch was first proposed by Alvaro Herrera in 2016 and went through the commitfest for two and a half years before landing in PostgreSQL 12. Andres Freund, who is also the main driver behind the AIO subsystem, joined the Table AM effort in 2018 and rewrote significant parts of it. His stated reason was that the existing version was leaking implementation details through the interface: buffer pinning logic that lived outside the AM, heap-specific types exposed in the generic API. He wrote at the time that “generic code shouldn’t know about the locking scheme a particular AM needs” (his words, in the mailing list thread). The fix was to move that knowledge inside the interface, even though heap was still the only real implementor.

That is the same principle at stake in Lucas’s patch. The question is whether AioShmemInit() should know that io_uring needs cleanup, or whether the method declares it through the interface and the framework handles it blindly. Based on Andres’s track record, he would almost certainly prefer v2. The abstraction exists not because there are multiple implementors today, but because the interface should not need to change when a second one appears tomorrow.

That exchange is not a failure of the patch. It is the review process working as it should. If you have a view on this, say so in your review.

Performance. This fix only runs on server exit, so there is no performance concern for the normal execution path.

Documentation. This is a bug fix with no user-visible behavior change. No doc update needed.

The code is clean and follows PostgreSQL conventions. No issues found.

Step 7: check the tests πŸ”—

Lucas’s patch does not include new regression tests. For a resource cleanup bug that only triggers on server restart after a backend is killed, writing a regression test is genuinely hard. TAP tests can handle this kind of scenario, but it is worth noting in your review whether you think a test is feasible or not.

Asking for a TAP test here is a stretch. Reliably counting file descriptors from inside a test harness after a SIGKILL is fragile. The reproduction steps in the original email are clear and serve as a manual test. That is enough for a bug fix of this scope.

Step 8: write your review πŸ”—

Your findings are only useful if you post them. Send your feedback to the relevant thread on pgsql-hackers. Be precise. If something failed, include the exact commands you ran, your PostgreSQL version, your OS, and the full error output. If something is wrong with the code, quote the relevant lines and explain why.

Be constructive. The author put time into this. You are not grading them, you are helping them get it committed.

Then update the patch status in the commitfest application.

  • Set it to Waiting on Author if you found problems that need fixing.
  • Set it to Ready for Committer if the patch is solid, tested, and complete.

If you are not sure, leave a comment and leave the status as is. A partial review with honest uncertainty is more useful than silence.

Go review a patch πŸ”—

Start with Lucas’s if you want a concrete target: Fix a file descriptor leak when using io_method=io_uring.

The queue is never empty.