I’m looking for getting anticipated table size by referring column type and length size. I’m trying to use pg_column_size for this.
When testing the function, I realized something seems wrong with this function.
The result value from pg_column_size(...) is sometimes even smaller than the return value from octet_length(...) on the same string.
There is nothing but numeric characters in the column.
postgres=# \d+ t5
Table "public.t5"
Column | Type | Modifiers | Storage | Stats target | Description
--------+-------------------+-----------+----------+--------------+-------------
c1 | character varying | | extended | |
Has OIDs: no
postgres=# select pg_column_size(c1), octet_length(c1) as octet from t5;
pg_column_size | octet
----------------+-------
2 | 1
704 | 700
101 | 7000
903 | 77000
(4 rows)
Is this the bug or something? Is there someone with the some formula to calculate anticipated table size from column types and length values of it?
I’d say
pg_column_sizeis reporting the compressed size ofTOASTed values, whileoctet_lengthis reporting the uncompressed sizes. I haven’t verified this by checking the function source or definitions, but it’d make sense, especially as strings of numbers will compress quite well. You’re usingEXTENDEDstorage so the values are eligible forTOASTcompression. See theTOASTdocumentation.As for calculating expected DB size, that’s whole new question. As you can see from the following demo, it depends on things like how compressible your strings are.
Here’s a demonstration showing how
octet_lengthcan be bigger thanpg_column_size, demonstrating where TOAST kicks in. First, let’s get the results on query output where noTOASTcomes into play:Now let’s store that same query output into a table and get the size of the stored rows: