There are three popular database schema’s in use in tagging systems today.
- MySQLicious
- Scuttle
- Toxi
In general the most often recommended scheme is the Toxi 3 table many-to-many relationship.
However I have come across a particular need where the number of tags will be limited to no more than 100. This number will always be constant with no tags ever being added to the database. Each individual item would likely have no more than about 20 tags.
In this situation what schema would you recommend and why? I am for the first time ever considering going with the totally denormalized MySQLicious schema and using FULLTEXT / LIKE for searching/filtering.
The dataset itself would likely never exceed 100,000.
Before you choose any denormalized design, you must have a solid idea of the queries you’re going to run against the data. Denormalized designs optimize for a certain subset of queries, at the expense of other queries.
For example, do you need to search for specific tags? If so, you need some way to find tags via an index. Do not use
LIKE '%word%'queries to search for substrings; it will run hundreds or thousands of times slower than a query that uses an index.Do you need to count occurrences of tags? If so, you need a solution that restricts duplicates. The only design you listed that prevents duplicates is Toxi.
I wouldn’t use any solution that requires the use of MyISAM FULLTEXT indexes. MyISAM is too fragile and susceptible to data corruption. Don’t store any data in MyISAM that you’re not prepared to regenerate or restore from backup regularly.