I’m writing an application that will have a SQL Server backend that will store (among other things) urls. URLS will be mapped to users, and some URLs may be common between different users. In absence of a true DBA, I’m trying to design a solution that can handle hundreds of thousands of URLs as efficiently as possible.
Ideas:
-
Create table that simply has ID, URL
Pro: simple, complete.
CON: duplicate entries for a URL will exist which will cause the table to be larger than it needs to be. -
Break up the user and URLs into separate tables. One table containing
USER ID, andURL ID. Another table withURL IDandURLitself.Pro: single URL in the system, seems more “enterprisey”
Con: must join two tables when trying to pull back results, and not really sure what the benefit of this approach is? -
Expand on the 2 idea, except REALLY break it up. So have a table for domain, another for path/query string. Then,
usertable would haveuserid, domain ID, path ID.Pro: urls could share data even if it was unrelated (meaning,
cnn.com/helloworldandnbc.com/helloworldwould have different domain ids, but same path ids.. seems this could be useful when running metrics later?Con: Seems like a nightmare from a performance perspective (again, because joins would be necessary to pull a URL.
Any thoughts?
I would do the following in my design:
Storing your URLs in a seperate table and only creating a new entry in the URL table, if an exact match does not already exist. If you have a lot of common URLs, this will save some space. You could take it a step farther and add a third table as you mentioned, e.g.
…and then tieing the UrlPathId to the User table. And perhaps even further:
…and again, referencing this from your User table.