Replica identity for logical replication

Β· 1179 words Β· 6 minute read

I was browsing through the logical replication restrictions when I noticed that a primary key was not mandatory to be able to perform logical replication. (see Postgres documentation)

I was even surprised that it’s never been mandatory. I guess my brain was confused with other logical replication systems like slony.

So, the next logical question here is: How does it work? How does it pick a row to update in case there is no primary key? The answer is “Replica identity”. It’s described here in Postgres documentation.

Replica identity πŸ”—

As the documentation states in the case, there is a primary key, it’s taken by default as the replica identity. If there is no primary key, you need to tell Postgres what to use as a replica identity, using the alter table mytable replica identity ... syntax. (see the Postgres alter table documentation.)

In case you don’t have a primary key, the using index clause can be used. Just remember your index needs to be unique, not partial, not deferrable, and should include only not null columns.

If you don’t have an index that satisfies all these conditions, you may use the full keyword. It will use all the columns in the table as the replica key. For performance reasons, you should avoid using it. If you care for performance, your data should be modelized (third normal form at least), so you’ll never encounter such an issue.

Let’s try it!

Pre-requisites πŸ”—

You must have two Postgres instances ready and be able to connect from the secondary host to the primary without password (using either a certificate, a password file or any other passwordless secured authentication method) with the user of your choice.

You must also ensure you’ll be able to perform native logical replication from your primary instance (version higher or equal to 10, and some parameters like wal_level, max_wal_senders, max_replication_slots). Please read the documentation for more informations.

Forging an example πŸ”—

I created 3 different tables for 3 uses cases:

  • A table called badmodel without any uniqueness or primary key
  • A table called withprimarykey with a primary key
  • A table called withuniquekey with a unique key but without a primary key

Here’s the code:

create table withprimarykey
(
  c integer primary key
);

create table withuniqueindex
(
  c integer unique not null
);

create table badmodel
(
  c integer
);

I granted all permissions on my tables to my replication user.

I then created a publication on those 3 tables:

create publication mypub for all tables ;

On another host, I created the 3 tables and the logical subscription.

create subscription mysub
  connection 'host=<my primary IP> dbname=amidala user=<my user>'
  publication mypub;

For the rest of the article, you need to know that the queen database is called amidala, so that should an SQL command begin with amidala=# , it was issued against the queen database. On the other side, the worker database is called padme, so that should an SQL command begin with padme=#, t was issued on the worker database.

If you need help with the queen/princess/worker denomination, please read Dimitri’s blog post.

Inserting some data πŸ”—

Table with primary key πŸ”—

I first tried the table with a primary key:

amidala=# insert into test default values;
INSERT 0 1
amidala=# insert into test default values;
INSERT 0 1
amidala=# insert into test default values;
INSERT 0 1
amidala=# select * from test;
 od 
----
  1
  2
  3
(3 rows)

If I connect to my worker, then I should see the data there, and sure enough:

padme=# select * from withprimarykey;
 c 
---
 1
 2
 3
(3 rows)

If I try to update one, value…

amidala=# update withprimarykey set c=4 where c =3;
UPDATE 1
padme=# select * from withprimarykey;  
 c 
---
 1
 2
 4
(3 rows)

… Everything works fine!

Table with unique index (and not null column) πŸ”—

Let’s do the same thing with the table that has a unique index:

amidala=# insert into withuniqueindex default values;
INSERT 0 1
amidala=# insert into withuniqueindex default values;
INSERT 0 1
amidala=# insert into withuniqueindex default values;
INSERT 0 1
amidala=# select * from withuniqueindex;
 c 
---
 1
 2
 3
(3 rows)

And on the worker, we find:

padme=# select * from withuniqueindex;
 c 
---
 1
 2
 3
(3 rows)

Everything’s great, so far. But what if I try to update one data?

amidala=# update withuniqueindex set c=4 where c =3;
ERROR:  cannot update table "withuniqueindex" because it does not have a replica
identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.

As often, Postgres is nice enough to help us solve the issue. I will need the index name to be able to use it as a replica identity.

amidala=# \d+ withuniqueindex
                              Table "public.withuniqueindex"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 c      | integer |           |          |         | plain   |              | 
Indexes:
    "withuniqueindex_c_key" UNIQUE CONSTRAINT, btree (c)
Publications:
    "mypub"
Access method: heap
amidala=# alter table withuniqueindex replica identity using index withuniqueindex_c_key;
ALTER TABLE
amidala=# update withuniqueindex set c=4 where c =3;
UPDATE 1
padme=# select * from withuniqueindex;
 c 
---
 1
 2
 3
(3 rows)

But my change is still not replicated in the worker. If I look at the Postgres log, I’ll find the following:

2020-12-14 14:49:40.592 UTC [4832] ERROR:  logical replication target relation "public.padme" has neither REPLICA IDENTITY index nor PRIMARY KEY and published relation does not have REPLICA IDENTITY FULL

If I duplicate the DDL into my worker, my change will then apply nicely:

padme=# alter table withuniqueindex replica identity using index withuniqueindex_c_key;
ALTER TABLE
padme=# select * from withuniqueindex;
 c 
---
 1
 2
 4
(3 rows)

What happens with the table without any unique index or primary key? πŸ”—

Again, I can add data into my table without any problem:

amidala=# insert into badmodel values(1);
INSERT 0 1
amidala=# insert into badmodel values(2);
INSERT 0 1
amidala=# insert into badmodel values(3);
INSERT 0 1
padme=# select * from badmodel ;
 c 
---
 1
 2
 3
(3 rows)

Problems happen when I try to update (or delete) an existing row:

amidala=# update badmodel set c=4 where c =3;
ERROR:  cannot update table "badmodel" because it does not have a replica identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.

The solution is the same as with a unique index, except that instead of specifying an index name, I’ll use full meaning, using all the rows:

amidala=# alter table badmodel replica identity full;
ALTER TABLE
amidala=# update badmodel set c=4 where c =3;
UPDATE 1
padme=# select * from badmodel ;
 c 
---
 1
 2
 4
(3 rows)

What if I try to change only one of 2 identical rows?

amidala=# select xmin, xmax, * from badmodel ;
 xmin | xmax | c 
------+------+---
  521 |    0 | 1
  556 |    0 | 2
  558 |    0 | 2
(3 rows)

amidala=# update badmodel set c=3 where c=2 and xmin=558;
UPDATE 1
padme=# select * from badmodel;
 c 
---
 1
 2
 3
(3 rows)

It works!