For the first PGSQL Phriday of the year, my friend Hettie asked us to share a useful script we often use. I’d like to talk about a fight I’m having with some developers (most developers?) almost every time I’m called for performance reasons: stop creating more indexes and only keep the useful ones!
What are indexes? π
I don’t know if you’re as passionate about cooking as I am, but my most useful cooking books have a table of content at the beginning and 2 useful indexes at the end. I use the table of contents to go easily to a section if I know the kind of dish I want to cook and have time to wander on the pages, but if I want to lose less time, I often refer to one of the indexes. The first index is an index by recipe name. The recipes are ordered alphabetically so that it’s easy to find the right one. The second one is the one I love: an index by ingredient! For each ingredient, you will find the list of recipes using it and the page number to go find the recipe.
An index in a database is no more than that. It’s something that you use to speed up some queries when you have specific filters.
What’s the problem with too many indexes? π
First, let me introduce you to one of the worst pieces of advice found on
a blog some
years ago: a developer was sharing his best advice for using the Django ORM
(or any ORM). One of his tips was indexing all columns that you might want to
filter on! The Django ORM even has (had?) an option to create all the indexes
automatically (db_index=True
).
This is why, when I’m called for a performance problem (or for an audit), my first take is to look at the size of the data compared to the size of the indexes. If you store more indexes than data for a transactional workload, that’s bad. The worst I’ve seen was a database with 12 times more indexes stored on disk than data! Of course, it was a transactional workload… Would you buy a cooking book with 10 pages of recipes and 120 pages of indexes at the end of the book?
The problem with indexes is that each time you write (insert, update, delete), you will have to write to the indexes too! That can become very costly in resources and time.
What can you do? π
First, make sure you do not create duplicate indexes. For internal purposes, PostgreSQL will create an index to enforce each unique constraint (meaning primary keys too). Postgres won’t prevent you from creating 3 or 4 indexes on the same column. To check you don’t have duplicate indexes, you can use this script from PGEXperts.
I love the scripts from PGExperts. Each time I have the idea to write a useful script for Postgres, I look there, and I have to admit that 9 times out of 10, I don’t have to write a script anymore!
Postgres keeps track of indexes usage as well as column usage (writes, reads), so that this script will give you advice on your existing indexes: are they unused? are they a little used but those columns are under a heavy write load? Are they very large and not that useful?
The last useful script about indexes is these two: needed indexes and foreign key no index. The first one will look at column usage to advise some index creation (will only work with single index columns) while the second one will list all the foreign keys without indexes. Due to their frequent usage in joins, foreign keys are very good candidates for indexing. Postgres won’t index the foreign keys automatically.
Indexes are not bad! π
Indexes are not bad, but can really harm your workload if not used the proper way. Defining a great indexing policy is an everyday job as the application queries, the data and the data model will change. Also, remember Postgres creates btree indexes by default but there are a lot more than mono column btree indexes in Postgres. Read chapter 11 of the Postgres Documentation to find out more about indexes in Postgres!
If you’re interested in understanding why this stupid planner refuses to use your index, you might find it interesting to read this book from Markus Winand: Use the index, Luke! The cherry on the top: if you want to test the effect of an index on a specific query plan, you should take a look at the hypopg extension that will create “hypothetical indexes”.
But, as the developer removing code is, to me, more valuable than the one adding code, the DBA able to remove indexes is more valuable than the one adding them.