I was wondering what the best way is to implement a tag system, like the one used on SO. I was thinking of this but I can’t come up with a good scalable solution.
I was thinking of having a basic 3 table solution: having a tags table, an articles tables and a tag_to_articles table.
Is this the best solution to this problem, or are there alternatives? Using this method the table would get extremely large in time, and for searching this is not too efficient I assume. On the other hand it is not that important that the query executes fast.
I believe you’ll find interesting this blog post: Tags: Database schemas
“MySQLicious” solution
In this solution, the schema has got just one table, it is denormalized. This type is called “MySQLicious solution” because MySQLicious imports del.icio.us data into a table with this structure.
Intersection (AND)
Query for “search+webservice+semweb”:
Union (OR)
Query for “search|webservice|semweb”:
Minus
Query for “search+webservice-semweb”
“Scuttle” solution
Scuttle organizes its data in two tables. That table “scCategories” is the “tag”-table and has got a foreign key to the “bookmark”-table.
Intersection (AND)
Query for “bookmark+webservice+semweb”:
First, all bookmark-tag combinations are searched, where the tag is “bookmark”, “webservice” or “semweb” (c.category IN (‘bookmark’, ‘webservice’, ‘semweb’)), then just the bookmarks that have got all three tags searched for are taken into account (HAVING COUNT(b.bId)=3).
Union (OR)
Query for “bookmark|webservice|semweb”:
Just leave out the HAVING clause and you have union:
Minus (Exclusion)
Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.
Leaving out the HAVING COUNT leads to the Query for “bookmark|webservice-semweb”.
“Toxi” solution
Toxi came up with a three-table structure. Via the table “tagmap” the bookmarks and the tags are n-to-m related. Each tag can be used together with different bookmarks and vice versa. This DB-schema is also used by wordpress.
The queries are quite the same as in the “scuttle” solution.
Intersection (AND)
Query for “bookmark+webservice+semweb”
Union (OR)
Query for “bookmark|webservice|semweb”
Minus (Exclusion)
Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.
Leaving out the HAVING COUNT leads to the Query for “bookmark|webservice-semweb”.