I have a private messaging system for my users that I’ve created in php with a mysql backend. The system deletes old messages but generally holds over 500,000 messages. Currently all of the data is included in one table:
message_table
message_id (int 11)
message_from_id (int 11)
message_to_id (int 11)
message_timestamp (int 11)
message_subject (varchar 50)
message_text (text)
The majority of messages are very short so I’m considering changing the system to:
message_table
message_id (int 11)
message_from_id (int 11)
message_to_id (int 11)
message_timestamp (int 11)
message_subject (varchar 50)
message_short_body (varchar 50)
message_text_id (int 11)
text_table
text_id (int 11)
text_body (text)
Then if a short message is entered it will be entered under ‘message_short_body’ and if longer will be added to ‘text_table’ and the ‘text_id’ stored as ‘message_text_id’. When messages are access I would then have something like:
SELECT * FROM message_table LEFT JOIN text_table ON text_table.text_id = message_table.message_text_id IF message_table.message_text_id != 0 WHERE message_table.message_to_id = $user_id
I added “IF message_table.message_text_id != 0” and don’t know if something like that is possible.
As a general rule is it possible to tell if this would reduce the size of the database / speed up queries ?
Unless there actually is a row with
text_id = 0in yourtext_table, there is no need to do this. Simply omit theIFand use the following query:In terms of performance, it might be that the engine can optimize things more efficiently if you add your condition to the join conditions:
You could also try an approach using a subquery:
This has the benefit of not executing the search in
text_tableif none is required, but the drawback of performing a separate query for each case with a long message. I would expect the above queries to be superior, but I’m not sure.You’ll have to benchmark, as it depends on the use case. If most of your queries retrieve data from the fields other than the text, then the smaller table will make those queries faster, yielding a performance gain. If, on the other hand, you usually want the body along withe the rest of the message, then you’ll likely end up with worse performance.
You should also use benchmarks to distinguish between the different alternatives described above.
In terms of size of the database, you’ll likely see an increase: the storage requirements for the text data are about the same, but the indices for the extra table will cost you.
I guess if this were my schema, I’d drop the
message_text_idand instead have primary key of thetext_tablematch that of themessage_table. I.e. each key occurs either only in the message table or in both tables, and rows with the same key belong together. Whether or not the message is in the other table could be encoded by settingmessage_table.message_short_bodytoNULLin these cases.