I’d appreciate some opinions on a concern I have.
I have a [User] table in my database, with the basic stuff you’d expect, like username, password, etc…
This application requires that I track a vast number of attributes for each user. So much so, that I will likely run out of columns (row storage space).
I’m tempted to add a UserProperties table with UserID, PropertyKey and PropertyValue columns. This approach fits well with the requirements.
My concern is that if each user has say 100 properties, when the database has a million users in it, we’ll have 100,000,000 property rows.
I would think that with a clustered index on the UserID, that access will still be screaming fast, and you are really storing about the same amount of data as you would with the mega-columns approach.
Any ideas or thoughts on performance concerns? Ideas for a better DB design?
UPDATE:
I have been toying around with the possibilities, and one thing keeps bothering me. I need to query on some of these attributes pretty frequently, and worse yet, these queries could involve finding all users who match criteria on as many as 10 of these attributes at the same time.
As a result, I am now leaning towards the mega-column approach, but possibly splitting the data off into one (or more) separate tables, forming a one-to-one relationship keyed on the UserID.
I’m using LinqToSql, and while I think tables with this many columns are inelegant, I think considering all the challenges and trade-offs, it is probably the right one, but I am still eager to hear other opinions.
What you’re describing is an Entity-Attribute-Value database, which is often used for exactly th situation you describe, sparse data tied to a single entity.
An E-A-V table is easy to search. The problem isn’t finding rows, it’s finding related rows.
Having different tables for different entities provides domain modeling, but they also provide a weak form of metadata. In E-A-V there are no such abstractions. (The Java analogy to E-A-V would be declaring that all functions’ formal arguments were of type Object — so you’d get no type-checking.)
We can easily look up the property keys, but nothing groups these property keys.
Wikipedia has a very good article on E-A-V, but read it now — it’s mostly the work of one author, and is slated for ‘improvement’.