I have a table that potentially will have high number of inserts per second, and I’m trying to choose a type of primary key I want to use. For illustrative purposes let’s say, it’s users table. I am trying to chose between using GUID and BIGINT as primary key and ultimately as UserID across the app. If I use GUID, I save a trip to database to generate a new ID, but GUID is not “user-friendly” and it’s not possible to partition table by this ID (which I’m planning to do). Using BIGINT is much more convenient, but generating it is a problem – I can’t use IDENTITY (there is a reason fro that), so my only choice is to have some helper table that would contain last used ID and then I call this stored proc:
create proc GetNewID @ID BIGINT OUTPUT
as
begin
update HelperIDTable set @ID=id, id = id + 1
end
to get the new id. But then this helper table is an obvious bottleneck and I’m concerned with how many updates per second it can do.
I really like the idea of using BIGINT as pk, but the bottleneck problem concerns me – is there a way to roughly estimate how many id’s it could produce per second? I realize it highly depends on hardware, but are there any physical limitations and what degree are we looking at? 100’s/sec? 1000’s/sec?
Any ideas on how to approach the problem are highly appreciated! This problem doesn’t let me sleep for many night now!
Thanks!
Andrey
GUID seem to be a natural choice – and if you really must, you could probably argue to use it for the PRIMARY KEY of the table – the single value that uniquely identifies the row in the database.
What I’d strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.
As Kimberly Tripp – the Queen of Indexing – and others have stated a great many times – a GUID as the clustering key isn’t optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.
Yes, I know – there’s
newsequentialid()in SQL Server 2005 and up – but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID – just a bit less prominently so.Then there’s another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well – thus you really want to make sure it’s as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables – and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.
So to sum it up: unless you have a really good reason, I would always recommend a
INT IDENTITYfield as the primary / clustered key on your table.Marc