I was browsing the Postgres slack channel when I ran into this intriguing question from a member:
How come my query
SELECT TO_CHAR(POW(2,54), ‘9999999999999999999’) AS V0, TO_CHAR(POW(2,54)-1, ‘9999999999999999999’) AS V1;gives that result?
v0 | v1 ------------------------------+---------------------- 18014398509481984 | 18014398509481984
Of course, I did some IT support, so my first reaction was to test by myself:
laetitia=# select laetitia-# power(2,54) as v1, laetitia-# power(2,54)-1 as v2; v1 | v2 ------------------------+------------------------ 1.8014398509481984e+16 | 1.8014398509481984e+16 (1 row)
The power function
So, I looked at the power function definition in the documentation:
power ( a numeric, b numeric ) → numeric power ( a double precision, b double precision ) → double precision a raised to the power of b power(9, 3) → 729
So, I found out that if you gave a numeric value, the power function will give a numeric answer, but if you fed it with a double precision value, then the result would be of datatype double precision.
Let’s check our datatype:
laetitia=# \gdesc Column | Type --------+------------------ v1 | double precision v2 | double precision (2 rows)
The double precision datatype
The double precision datatype is not precise (yes, the name can be misleading). So, what we are experiencing here is just a rounding error due to the datatype double precision being used.
Ok, so Postgres understands 2 and 54 as double precision. But then why does Postgres understands them as integers if I do:
laetitia=# select 2 as v1, 54 as v2 \gdesc Column | Type --------+--------- v1 | integer v2 | integer (2 rows)
The explanation is on this precious documentation page.
So basically, the parser will categorize each “word” into one of those 5 categories: integers, non-integer numbers, strings, identifiers, and key words.
So, when we do
select 2,54, the parser understands the static values as integers
and stops there.
But when we used the function power, we forced the parser to reconsider the data type of 2 and 54. (This happens because a function can be overloaded with a different signature, so knowing the right datatype is important to use the right function).
So, the parser will then consider 7 categories of datatype categories: boolean, numeric, string, bitstring, datetime, timespan, geometric, network, and user-defined.
In our case, the parser will consider 2 and 54 as numeric datatypes.
That’s when the notion of “preferred datatype” enters the explanation.
The documentation explains that for each category of datatypes, there is one or several preferred datatypes. You can see the preferred datatypes in the pg_type table.
laetitia=# select case when typcategory='A' then 'Array types' when typcategory='B' then 'Boolean types' when typcategory='C' then 'Composite types' when typcategory='D' then 'Date/time types' when typcategory='E' then 'Enum types' when typcategory='G' then 'Geometric types' when typcategory='I' then 'Network address types' when typcategory='N' then 'Numeric types' when typcategory='P' then 'Pseudo-types' when typcategory='R' then 'Range types' when typcategory='S' then 'String types' when typcategory='T' then 'Timespan types' when typcategory='U' then 'User-defined types' when typcategory='V' then 'Bit-string types' when typcategory='X' then 'unknown types' end as category, typname from pg_type where typispreferred order by category; category | typname -----------------------+------------- Bit-string types | varbit Boolean types | bool Date/time types | timestamptz Network address types | inet Numeric types | oid Numeric types | float8 String types | text Timespan types | interval (8 rows)
And that’s when I found out there were 2 preferred types for a numeric value: oid and float8 (which means double precision, there are several aliases like that in Postgres).
But as the power function is not defined for oids (it is for the datatypes smallint, integer, bigint, numeric, real, and double precision) it has to cast 2 and 54 as double precision, which then gives a resualt as a double precision which then creates the rounding error.
That’s exactly why I browse regularly the slack channel! From time to time, you can deep dive into things and learn something new!