I’ve started working on a project where there is a fairly large table (about 82,000,000 rows) that I think is very bloated. One of the fields is defined as:
consistency character varying NOT NULL DEFAULT 'Y'::character varying
It’s used as a boolean, the values should always either be (‘Y’|’N’).
Note: there is no check constraint, etc.
I’m trying to come up with reasons to justify changing this field. Here is what I have:
- It’s being used as a boolean, so make it that. Explicit is better than implicit.
- It will protect against coding errors because right now there anything that can be converted to text will go blindly in there.
Here are my question(s).
- What about size/storage? The db is UTF-8. So, I think there really isn’t much of a savings in that regard. It should be 1 byte for a
boolean, but also 1 byte for a'Y'in UTF-8 (at least that’s what I get when I check the length in Python). Is there any other storage overhead here that would be saved? - Query performance? Will Postgres get any performance gains for a where cause of “
=TRUE” vs. “='Y'“?
PostgreSQL (unlike Oracle) has a fully-fledged
booleantype. Generally, a "yes/no flag" should beboolean. That’s the appropriate type!What about size/storage?
A
booleancolumn occupies 1 byte on disk.(The manual) about
textorcharacter varying:That’s at least 2 bytes for a single character.
Actual storage is more complicated than that. There is some fixed overhead per table, page and row, there is special
NULLstorage and some types require data alignment. See:Encoding
UTF8doesn’t make any difference here. Basic ASCII-characters are bit-compatible with other encodings likeLATIN-1.In your case, according to your description, you should keep the
NOT NULLconstraint you already have – independent of the data type.Query performance?
Will be slightly better in any case with
boolean. Besides being smaller, the logic forbooleanis simpler andvarcharortextare also generally burdened withCOLLATIONrules. But don’t expect much for something that simple.Instead of:
You could write:
But rather simplify to just:
No further evaluation needed.
Change type
Transforming your table is simple:
This
CASEexpression folds everything that is notTRUE(‘Y’) toFALSE. TheNOT NULLconstraint just stays.