My users can update their information, which is saved in a defined number of columns in a table, such as: user ( id INT, email VARCHAR, phone VARCHAR, address VARCHAR ), for example.
I have seen other implementations, like the one for WordPress, that stores this information for its users in a table called usermeta with a layout ( umeta_id INT, user_id INT, meta_key VARCHAR, meta_value VARCHAR ).
In the change log that I want to implement, I am evaluating between using a solution like that or making (what I think that will be better), a layout like: userLog ( id INT, date TIMESTAMP, email VARCHAR, phone VARCHAR, address VARCHAR ).
So, I can have a history of all the information any user had at a given date. Rows would only record the changes, having NULL on unaltered columns.
For the first question: Is there any advantage to this kind of layout other than being able to create new information type by just inserting an appropriate meta_key?
I sometimes think that this layout can be not really appropriate if performance is a matter in my environment, since I would be using a VARCHAR for every single kind of data that I want to store.
For the second question: Can storage and select/insert efficiency really make a difference between the two solutions I am considering?
Which solution should be less (or more) space-consuming and/or less (or more) select/insert efficient than the other and why?
Some thoughts, if not necessarily an answer:
Clearly a change log is a must-have for you, so the original structure with a single row per user is not a solution for you. So we’re talking about the choice between:
Solution 1 corresponds to your
Solution 2 corresponds to the WordPress one:
Your question 1: I can’t see any advantage to Solution2 except that, if you subsequently decide you want to capture users’ (for example) Website URL or (for example) favourite colour as well, you can do that by adding a meta_key. But you could equally easily do this under Solution1, by simply doing an
That’s not hard to do. Unless the DBAs in your shop are unusually Dobermann-like ( 😉 ). Because you’re holding a change-log, all existing users (at the time of the change) will now have a blank WebsiteURL column; but that’s exactly what you want: you don’t know their WebsiteURL, because the system didn’t capture it before. Sure, the new column will have to be NULLABLE – but that may be unavoidable anyway, even with the “initial” data, unless the method you’re using to capture user info insists on email, phone and address as required columns.
To me, the disadvantages of the meta_key solution outweigh the advantages. The disadvantages are:
You have to develop a piece of pivot code to pivot user info for one user onto one
row. You must call this code in every place you want to get user info on one row. In
contrast, Solution1 only requires
SELECT userID,[all user info] FROM userLog INNER JOIN (SELECT userID,MAX(datechanged) AS LatestDAteChanged FROM userlog GROUP BY userID) a ON userlog.userid=a.userID AND userlog.DateChanged=a.LatestDAteChanged
which is far more efficient than a pivot. With an index on UserID,DateChanged, this’ll
run like the wind.
Unless you really want to hold meta_key values multiple times in the userinfo table (Email, Email, Email, Email, Email), you’d need an extra Meta_Key_Lookup table.
Second question:
For ultimate space-efficiency, yes, the meta_key Solution2 is the best. Especially if you don’t use VARCHAR metakeys, but metakey ID values, and have a separate meta_key lookup table (e.g. 1=Email, 2=Phone etc). But I don’t think this is a conclusive argument for the meta_key Solution2, given the virtually-zero price of storage, and the difficulties involved in this solution.
(A note/thought: IMHO your idea of holding NULL values in your solution1, where the value has not changed, is a wrong road. The coding to try to get the most recent email, then phone, then address (separately) for each user, will be a nightmare: almost as hard to code/test – and for the server to run – as the pivot required by the other solution. And the reduction in storage marginal. Just hold the entire row every time one thing changes. Unless you’re just giving examples, and the real user info-set is 50 columns wide…)
IMHO the storage issue is not decisive. So let’s turn to SELECT/INSERT efficiency:
On this issue, I think Solution1 still wins. On Inserts, SOlution1 wins: only one row is inserted, even if the user changes every field in their info. On SELECTS, SOlution 1 wins again: you only need a view of the most recent info per user (code above), which is the kind of thing SQL is optimised for. In contrast, Solution2 would require a pivot: something SQL is not good at.