I have a large hexidecimal (16 byte, 32 hex digits) data item that always has the format:
00d980113901429fa6de7fb7e2da705a
This is coming in as an ASCII string from my source (i.e., the zero above is character zero 0x30, not 0x00), and I would like to know peoples’ opinions on the best way (irt storage and speed) to store this in PostgreSQL.
The obvious thing to do is to just store it as a varchar, but storing it in a binary form would definitely save space. Would I see performance gains from select and insert by storing it in a binary form? Would bytea or bit be better? Is there a difference between these two in terms of internal representation?
Another idea would be to store it as two bigint/int8 or four integer/int4, split up into multiple columns.
Space and time are an issue as I have MANY of these (upwards of a trillion).
Compare these two tables of 10M records:
A
byteawith index is 11% bigger than2*int8. This isn’t much, but it means that 11% less rows will be in cache. And sequentional scans will be 11% slower etc.If your data does not change maybe you should consider a flat file storage of sorted values instead of database – this will be only 152MB per 10M records and searching will be O(log(n)).