This is a FACT Table in a Data Warehouse
It has a composite index as follows
ALTER TABLE [dbo].[Fact_Data]
ADD CONSTRAINT [PK_Fact_Data]
PRIMARY KEY CLUSTERED
(
[Column1_VarChar_10] ASC,
[Column2_VarChar_10] ASC,
[Column3_Int] ASC,
[Column4_Int] ASC,
[Column5_VarChar_10] ASC,
[Column6_VarChar_10] ASC,
[Column7_DateTime] ASC,
[Column8_DateTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
GO
In this structure, all of the varchar 10 columns have numeric values only. Is it going to be beneficial for me to change this 78 million row structure to hold BIGINT instead of VARCHAR in terms of querying and indexing?
Any other benefits / drawbacks that I should consider?
You should DEFINITELY introduce a surrogate
INT IDENTITY()primary key!!INT already gives you potentially up to 2 billion rows – isn’t that enough??
This primary key / clustered key on SQL Server will be up to 64 bytes in size (instead of 4, for a INT) – which will make your clustered index AND all your non-clustered index be bloated beyond recognition. The whole clustering key (all your 8 columns) will be included on every single page of every single non-clustered index on that table – wasting lots and lots of space for sure.
So on any given index table, you would have up to 16 times more entries with a surrogate INT clustered key – that means lots less I/O, lots less time wasted reading index pages.
And just imagine trying to establish a foreign-key relationship to that table…. any child table would have to have all 8 columns of your primary key as foreign key columns, and specify all 8 columns in every join – what a nightmare!!
At 78 million rows, even just changing the clustering key to INT IDENTITY will save you up to 60 bytes per row – that alone would come out to be up to 4 GByte of disk space (and RAM usage in your server). And that’s not even beginning to calculate the savings on the non-clustered indices…….
And of course, yes, I would also change the VARCHAR(10) to INT or BIGINT – if it’s a number, make the field type numeric – no point in leaving it at VARCHAR(10), really. But that alone is not going to make a huge difference in terms of speed or performance – it just makes working with the data that much easier (don’t have to always cast around to numeric types when e.g. comparing values and so forth).
Marc