Simply auditing your database changes

ยท 1683 words ยท 8 minute read

[2021-02-28 Edit:Thanks to tapoueh, I found out you could do case insensitive text match using the ~* operator instead of lower(attribute) ~ <value>. I changed the SQL code accordingly.]

This is something a lot of people want: being able to track changes happening on their database when and who ordered them. Well, I have good news, my friends, there is a native feature in Postgres that can help you achieve that exact goal!

DDL and DML ๐Ÿ”—

First things first, what are DDL and DML? Both DDL and DML are a subset of the SQL language. DDL means Data Definition Language (or Data Description Language). It means the SQL orders you need to create or modify database objects. An example of a DDL query is a create query. DML means Data Manipulation language. It means the SQL orders you need to modify your data such as insert, delete, update but also to query your data like select

Logging everything ๐Ÿ”—

Postgres logging is wonderful. For performance reasons, logging by default is minimal but I assure you you want to log more than the minimum information, to be able to understand what’s going on, to prevent things to go bad, to make performance better and so on.

I need to write a blog post about logging (it’s on my todo for a long time), but here are the minimal settings that will allow you to play with capturing DDL and DML on your tables:

  • log_destination = 'csvlog' (we’ll see later why)
  • logging_collector = on (or else, you won’t be able to get your logs, see What’s wrong with Postgres
  • log_statement = 'mod' (or ddl if you simply need to log only DDL or all if you’d like to all see everything)

Because we changed log_destination which can only be set at startup, Postgres will need to be restarted to take the new values into account.

Adding some modifications ๐Ÿ”—

I’ll simply use pgbench to generate some transactions. I’ll need to initialize it before.

pgbench -i bench
pgbench -c 10 -T 60 bench

Now, if I look into my logfile, I should see some ddl and/or dml played:

2021-02-25 17:24:37.539 UTC,"postgres","bench",22934,"[local]",6037dd19.5996,40879,"UPDATE",2021-02-25 17:23:37 UTC,11/5841,925701,LOG,00000,"duration: 10.922 ms  statement: UPDATE pgbench_tellers SET tbalance = tbalance + 885 WHERE tid = 8;",,,,,,,,,"pgbench"
2021-02-25 17:24:37.539 UTC,"postgres","bench",22933,"[local]",6037dd19.5995,41265,"UPDATE",2021-02-25 17:23:37 UTC,10/5896,925702,LOG,00000,"duration: 11.971 ms  statement: UPDATE pgbench_branches SET bbalance = bbalance + 1213 WHERE bid = 1;",,,,,,,,,"pgbench"
2021-02-25 17:24:37.539 UTC,"postgres","bench",22933,"[local]",6037dd19.5995,41266,"INSERT",2021-02-25 17:23:37 UTC,10/5896,925702,LOG,00000,"duration: 0.054 ms  statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (5, 1, 1328, 1213, CURRENT_TIMES
TAMP);",,,,,,,,,"pgbench"

Using an FDW ๐Ÿ”—

I will now use a Foreign Data Wrapper (FDW) to read this data with Postgres. I will then create nice views to be able to display DDL and DML simply as in a table.

create extension file_fdw;

create server pglog foreign data wrapper file_fdw;

create foreign table pglog (
    log_time timestamp(3) with time zone,
    user_name text,
    database_name text,
    process_id integer,
    connection_from text,
    session_id text,
    session_line_num bigint,
    command_tag text,
    session_start_time timestamp with time zone,
    virtual_transaction_id text,
    transaction_id bigint,
    error_severity text,
    sql_state_code text,
    message text,
    detail text,
    hint text,
    internal_query text,
    internal_query_pos integer,
    context text,
    query text,
    query_pos integer,
    location text,
    application_name text
) server pglog
options (filename '/usr/local/pgsql/data/log/postgresql-Thu.csv', format 'csv' );

The DDL view ๐Ÿ”—

I will create a view to query DDL changes. I won’t have an object_name column because it would mean creating an SQL syntax parser in SQL which is doable, but also not so trivial.

The only DDL orders are create, drop, alter, and truncate. The statements log messages all begin with ‘statement: ‘.

create view ddl as (
  select log_time,
    database_name,
    user_name,
    (regexp_split_to_array(message, ' '))[2] as command,
    /* Taking care of truncate which only works for a table, therefore does not state it's for a table */
    case when (regexp_split_to_array(message, ' '))[2] = 'truncate' then 'table'
      else
        /* the object must be one of those listed in the case when statement */
        case
          when message ~* ' access method ' then 'access method'
          when message ~* ' aggregate ' then 'aggregate'
          when message ~* ' cast ' then 'cast'
          when message ~* ' collation ' then 'collation'
          when message ~* ' conversion ' then 'conversion'
          when message ~* ' database ' then 'database'
          when message ~* ' domain ' then 'domain'
          when message ~* ' event trigger ' then 'event trigger'
          when message ~* ' extension ' then 'extension'
          when message ~* ' foreign data wrapper ' then 'foreign data wrapper'
          when message ~* ' foreign table ' then 'foreign table'
          when message ~* ' function ' then 'function'
          when message ~* ' group ' then 'group'
          when message ~* ' index ' then 'index'
          when message ~* ' language ' then 'language'
          when message ~* ' materialized view ' then 'materialized view'
          when message ~* ' operator class ' then 'operator class'
          when message ~* ' operator family ' then 'operator family'
          when message ~* ' operator ' then 'operator'
          when message ~* ' policy ' then 'policy'
          when message ~* ' procedure ' then 'procedure'
          when message ~* ' publication ' then 'publication'
          when message ~* ' role ' then 'role'
          when message ~* ' rule ' then 'rule'
          when message ~* ' schema ' then 'schema'
          when message ~* ' sequence ' then 'sequence'
          when message ~* ' server ' then 'server'
          when message ~* ' statistics ' then 'statistics'
          when message ~* ' subscription ' then 'subscription'
          when message ~* ' table ' then 'table'
          when message ~* ' tablespace ' then 'tablespace'
          when message ~* ' test search configuration ' then 'text search configuration'
          when message ~* ' test search dictionary ' then 'text search dictionary'
          when message ~* ' test search parser ' then 'text search parser'
          when message ~* ' test search template ' then 'text search template'
          when message ~* ' transform ' then 'transform'
          when message ~* ' trigger ' then 'trigger'
          when message ~* ' type ' then 'type'
          when message ~* ' user mapping ' then 'user mapping'
        end
    end as object_type,
    regexp_replace(message, 'statement: ','') as query
  from pglog
  where error_severity = 'LOG'
    and message like 'statement: %'
    and lower((regexp_split_to_array(message, ' '))[2]) in ('create','drop','truncate','alter')
);

Now, if I want to know what DDL was played on my table pgbench_tellers, I can simply play that query:

select *
from ddl
where object_type = 'table'
  and database_name = 'bench'
  and query ~ 'pgbench_tellers';

And the result will display nicely:

         log_time          | database_name | user_name | command  | object_type |                                                   query                                                   
----------------------------+---------------+-----------+----------+-------------+-----------------------------------------------------------------------------------------------------------
 2021-02-25 17:52:16.998+00 | bench         | postgres  | drop     | table       | drop table if exists pgbench_accounts, pgbench_branches, pgbench_history, pgbench_tellers
 2021-02-25 17:52:17.001+00 | bench         | postgres  | create   | table       | create table pgbench_tellers(tid int not null,bid int,tbalance int,filler char(84)) with (fillfactor=100)
 2021-02-25 17:52:17.003+00 | bench         | postgres  | truncate | table       | truncate table pgbench_accounts, pgbench_branches, pgbench_history, pgbench_tellers
 2021-02-25 17:52:17.138+00 | bench         | postgres  | alter    | table       | alter table pgbench_tellers add primary key (tid)

The DML view ๐Ÿ”—

We will do the exact same trick for the dml view (I excluded the select queries, but it shouldn’t be a lot of extra work to have womething for selects too):

create view dml as (
  select log_time,
    database_name,
    user_name,
    case
      when message ~* ' insert ' then 'insert'
      when message ~* ' update ' then 'update'
      when message ~* ' delete ' then 'delete'
    end as command,
    case
      /* [with ...] insert into <table_name> */
      when message ~* ' insert into '
       then (regexp_split_to_array((regexp_split_to_array(lower(message), ' insert into '))[2], '[ |\(]'))[1]
      /* [with ...] update [only] <table_name> */
      when message ~* ' update only '
       then (regexp_split_to_array((regexp_split_to_array(lower(message), ' update only '))[2], '[ |\(]'))[1]
      when message ~* ' update '
       then (regexp_split_to_array((regexp_split_to_array(lower(message), ' update '))[2], '[ |\(]'))[1]
      /* [with ...] delete from [only] <table_name> */
      when message ~* ' delete from only '
       then (regexp_split_to_array((regexp_split_to_array(lower(message), ' delete from only '))[2], '[ |\(]'))[1]
      when message ~* ' delete from '
       then (regexp_split_to_array((regexp_split_to_array(lower(message), ' delete from '))[2], '[ |\(]'))[1]
    end as table_name,
    regexp_replace(message, 'statement: ','') as query
  from pglog
  where error_severity = 'LOG'
    and message like 'statement: %'
    and lower((regexp_split_to_array(message, ' '))[2]) in ('insert', 'update', 'delete')
);

And now, I can ask for the last 10 changes on my table pgbench_tellers:

select *
from dml
where database_name = 'bench'
  and table_name = 'pgbench_tellers'
order by log_time desc
limit 10;

And it will display the result nicely:

          log_time          | database_name | user_name | command |   table_name    |                                 query                                  
----------------------------+---------------+-----------+---------+-----------------+------------------------------------------------------------------------
 2021-02-25 17:53:19.718+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -4449 WHERE tid = 8;
 2021-02-25 17:53:19.717+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + 2166 WHERE tid = 6;
 2021-02-25 17:53:19.717+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -3332 WHERE tid = 7;
 2021-02-25 17:53:19.717+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -1430 WHERE tid = 4;
 2021-02-25 17:53:19.717+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + 4278 WHERE tid = 2;
 2021-02-25 17:53:19.716+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -2216 WHERE tid = 4;
 2021-02-25 17:53:19.715+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -4107 WHERE tid = 5;
 2021-02-25 17:53:19.715+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -3258 WHERE tid = 10;
 2021-02-25 17:53:19.714+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + -3633 WHERE tid = 9;
 2021-02-25 17:53:19.714+00 | bench         | postgres  | update  | pgbench_tellers | UPDATE pgbench_tellers SET tbalance = tbalance + 703 WHERE tid = 9;
(10 rows)

And voilร ! We did not use ugly triggers, we did not add any exotic extension, and we have a native way to track down changes on our database objects!

And as a final trick, just remember the log_statement parameter can be set on a database-level like that:

alter database bench set log_statement = 'none' ;