Preferred types

Sep 24, 2022 · 653 words · 4 minute read

I was browsing the Postgres slack channel when I ran into this intriguing question from a member:

How come my query SELECT TO_CHAR(POW(2,54), ‘9999999999999999999’) AS V0, TO_CHAR(POW(2,54)-1, ‘9999999999999999999’) AS V1; gives that result?
           v0                 |          v1
------------------------------+----------------------
   18014398509481984          |    18014398509481984 

Of course, I did some IT support, so my first reaction was to test by myself:

laetitia=# select
laetitia-#   power(2,54) as v1,
laetitia-#   power(2,54)-1 as v2;
           v1           |           v2           
------------------------+------------------------
 1.8014398509481984e+16 | 1.8014398509481984e+16
(1 row)

The power function 🔗

So, I looked at the power function definition in the documentation:

power ( a numeric, b numeric ) → numeric
power ( a double precision, b double precision ) → double precision
  a raised to the power of b
  power(9, 3) → 729

So, I found out that if you gave a numeric value, the power function will give a numeric answer, but if you fed it with a double precision value, then the result would be of datatype double precision.

Let’s check our datatype:

laetitia=# \gdesc
 Column |       Type       
--------+------------------
 v1     | double precision
 v2     | double precision
(2 rows)

Bingo!

The double precision datatype 🔗

The double precision datatype is not precise (yes, the name can be misleading). So, what we are experiencing here is just a rounding error due to the datatype double precision being used.

But why? 🔗

Ok, so Postgres understands 2 and 54 as double precision. But then why does Postgres understands them as integers if I do:

laetitia=# select 2 as v1, 54 as v2 \gdesc
 Column |  Type   
--------+---------
 v1     | integer
 v2     | integer
(2 rows)

The explanation is on this precious documentation page.

So basically, the parser will categorize each “word” into one of those 5 categories: integers, non-integer numbers, strings, identifiers, and key words.

So, when we do select 2,54, the parser understands the static values as integers and stops there.

But when we used the function power, we forced the parser to reconsider the data type of 2 and 54. (This happens because a function can be overloaded with a different signature, so knowing the right datatype is important to use the right function).

So, the parser will then consider 7 categories of datatype categories: boolean, numeric, string, bitstring, datetime, timespan, geometric, network, and user-defined.

In our case, the parser will consider 2 and 54 as numeric datatypes.

That’s when the notion of “preferred datatype” enters the explanation.

Preferred datatypes 🔗

The documentation explains that for each category of datatypes, there is one or several preferred datatypes. You can see the preferred datatypes in the pg_type table.

laetitia=# select
  case
    when typcategory='A' then 'Array types'
    when typcategory='B' then 'Boolean types'
    when typcategory='C' then 'Composite types'
    when typcategory='D' then 'Date/time types'
    when typcategory='E' then 'Enum types'
    when typcategory='G' then 'Geometric types'
    when typcategory='I' then 'Network address types'
    when typcategory='N' then 'Numeric types'
    when typcategory='P' then 'Pseudo-types'
    when typcategory='R' then 'Range types'
    when typcategory='S' then 'String types'
    when typcategory='T' then 'Timespan types'
    when typcategory='U' then 'User-defined types'
    when typcategory='V' then 'Bit-string types'
    when typcategory='X' then 'unknown types'
  end as category,
  typname
from pg_type
where typispreferred
order by category;
       category        |   typname   
-----------------------+-------------
 Bit-string types      | varbit
 Boolean types         | bool
 Date/time types       | timestamptz
 Network address types | inet
 Numeric types         | oid
 Numeric types         | float8
 String types          | text
 Timespan types        | interval
(8 rows)

And that’s when I found out there were 2 preferred types for a numeric value: oid and float8 (which means double precision, there are several aliases like that in Postgres).

But as the power function is not defined for oids (it is for the datatypes smallint, integer, bigint, numeric, real, and double precision) it has to cast 2 and 54 as double precision, which then gives a resualt as a double precision which then creates the rounding error.

That’s exactly why I browse regularly the slack channel! From time to time, you can deep dive into things and learn something new!

Internals PostgreSQL