I have a logging table that I need to run close to real time reports on. The primary key is a guid that is inserted non sequentially. When other indexes are added to aid in reporting speeds I am seeing timeouts on inserts. I plan on creating a duplicate table with the reporting indexes and leaving only the clustered index with a fill factor of 75% on the in the logging table.
I feel like using replication would be overkill for only one table. So my question is how should I accomplish this? Should I schedule a sql agent job?
Any help would be great! Thanks!
You have a lot of options available to you. To solve your immediate problem, I’d make sure that the logging table has a sequential ID on it and only copy across new data to the well indexed reporting table. You can then run a NON-LOCKING proc to get just the max value for the corresponding column in the reporting and get anything bigger from the logging table.
NB: See this for NOLOCK debate. Only use it if your log table is BIG and BUSY, but your scenario is an example of where it makes sense.
I’m not sure what generates your GUID, but if it is SQL server, then making the column default to NEWSEQUENTIALID() means it’s not entirely random, but it is sequential. If you are getting the GUID from somewhere else, then consider putting a bigint IDENTITY(1,1) column to make sure your clustered index is not badly fragmented. A fragmented clustered index is bad as your other indexes use it for lookups so no matter how nice they are, they’ll still run slower than the primary.
Then you can structure your reporting table any way you like with a clustered index on the GUID and only and use the max value in your well indexed table to only insert the new data you don’t already have. You can also run re-index on that table to ensure it’s as fast as possible.
More information..
If you expect your system to grow a lot, then you should look at some other reporting patterns to denormalise or simplify the data as you dump it in the reporting tables.
Take a look at the star pattern or you can just denormalise daily or weekly stats into a seperate table like the following:
then you can run a non-locking stored procedure to get a day’s stats at a time. I’d also recommend that days stats be loaded based on what you don’t have in the reporting database incase the agent job that runs it fails, it will take care of backlog when it runs next. Obviously these patterns are a lot more work and the wrong choice can be bad so only make this choice if you need to and know what’s expected of your reporting.