I am working on a website where users have core data about their profile (user_name, email, preferences, etc) as well as arbitrary data that could change on the fly. Not all users would necessarily have a need for all data fields. Thus on my tblUsers MySQL table, I don’t want to add a bunch of columns that might only be practical to a small percentage of users.
The way I imagined would be to create a second table, with the following columns:
UID INT, dataType TINYINT, dataValue INT
Basically UID would point to the User’s ID in the users table (tlbUsers) and the dataType would point to an ID in a list (another table) of dataTypes, such as “Age”, “Favorite Color”, “Points”, etc.
The problem is, when I say:
“SELECT * FROM tblUsers, tblData WHERE UID=ID”
I get several rows stacked (which works well enough…)
But I can’t figure out how to write a query which takes the tblData information into consideration.
For example, let’s say I want to select all users who are 21 and have a score between 400-500.
If they were actual columns, I would say:
SELECT * FROM tblUsers, tblData, WHERE UID=ID AND dataAge = 21 AND dataScore >= 400 AND dataScore <= 500
However, I can’t do this because dataAge and dataScore aren’t columns – they are rows in the dataTable like such:
UID dataType dataValue
35 1 21 //user #35's age (dataType 1)
35 2 467 //user #35's score (dataType 2)
49 1 21
49 2 491
I cannot predict what dataTypes will be required in the future.
Users could arbitrarily add dataTypes about themselves, and not all users will have all possible data types at once.
I imagine also using another table, for text data, with the same format, UID, dataType, dataString.
Lets say I write
SELECT * FROM tblUsers, tlbData, WHERE UID=ID AND dataType=1 AND dataValue=21 AND dataType=2 AND dataValue >= 400 AND dataValue <= 500
because I want to compare both Age and Score, both dataType and dataValue are used ambiguously in the same call…
My question: What’s the best table structure for my needs OR how can I properly query my current set up?
A User must fulfill all criteria in your query, so you have to join
tblDatamultiple times, just like it were multiple tables:Answer to additional question in comments
For this to be performant, indexes are crucial. In particular case you will probably need the following indexes:
I assume that
idis the primary key oftblUsersand is indexed automatically as such.Read about multi-column indexes in the manual.
JOIN performance has been increased recently, but is still lacking behind other database systems like Oracle, SQL Server or PostgreSQL where JOINs are handled very performant. MySQL is not the best choice for lots of JOINs and subqueries.
For your particular case (multiple joins that can be combined) bitmap index scans will provide top performance – a feature that is not present in MySQL. It has an "index_merge" feature so substitute for that.