All,
I am trying to create a table to receive user inputs (UGC). This content could vary in size from a single character up to a few hundred words. The input will be coded in utf8_unicode_ci and could be in Latin or multi-byte characters.
The input will have to be searchable.
(Longer term I might want to store non-text objects – pictures and the like, but for now let’s focus on UTF8 text.)
At this point, I am only envisioning 2 fields to this table: an ID (autoincrement INT(10) ) and the UGC itself. (I might need a few more fields like dateAdded, etc.)
How should I structure my DB to allow for a good compromise between flexibility and performance? I could…
- Set up a high limit on the size of the string and take the performance & usability hits.
- Create several tables for various size ranges (and eventually types), and identify each item by a combination of table name and ID (so I’d need a central table with unique ID, table name, table-specific ID).
- I could store each object separately and simply have the db store a url. I suspect that ends up being a less efficient version of #2, but I’m out of my depth.
Thank you,
JDelage
There is a good rule of thumb – and as all rules of thumb it is far from perfect – that has been working quite well for me:
With this and my experience so far in mind, I discourage use of a BLOB field for images etc.
Now when thinking of content, that can be text, image or whatever, I am quite sure your business logic will need some field, that tells it how to use the content of the big field anyway – it’s hard to think of an app that would treat an image as an image just after looking at the data. So I recommend you create such a field,
mimetypewould come to mind, and a, say,mediumtextfield. Your app business logic could easily deduce, thatmimetype='text/plain'would mean the data in the text field is the payload, whilemimetype='image/png'would mean, that the data in the text field is the (relative) path to a file resource.This gives you searching and indexing on the content, with quite a low probability of false matches, if you create your file pathes in a way, that is not expected to be a word in any language.
MD5(basename).suffixcomes to mind.