I have a star schema type data base, with fact tables that have many foreign keys to dimension tables. The number of records in each dimension table is small – often less than 256 bytes, but always less than 64k. The fact tables typically have hundreds of thousands of records, so I want maximize join speed.
I’d like to use tinyints and smallints, but a coworker says I’m crazy to worry about this and just use 4 byte ints in every case. Who is right?
Yr co-worker is wrong. If you use four byte integers for the foreign Keys, then the primary keys in the fact table have to be 4-byte integers as well. And then you are making your fact table wider than it needs to be, reducing the number of records that can fit on a single index page. To the degree that this changes the width of the primary Key Index, this will adversely affect index performance. If your Primary key could have been two tinyInts and 3 smallints, and you change to five 4-byte ints, you have changed the width of the index from 8 bytes wide to 20 bytes wide. Your index will have less than half as many entries per I/O page, and it will require twice as many logical and/or physical reads to traverse.
NOTE: As Jim McLeod’s answer below, SQL Server 2008, (Enterprise or Developer edition), includes row-level compression, which means you can declare the value as a 4-byte INT, but it will store the value in the most appropriately sized type for each row.