I’ve got two tables…
Table “tags”
+---------+----------+
| tag | id |
+---------+----------+
| nancy | 902 |
+---------+----------+
| fred | 903 |
+---------+----------+
| suzan | 904 |
+---------+----------+
| joe | 905 |
+---------+----------+
and table tag to tag
+---------+----------+
| tag_a | tag_b |
+---------+----------+
| 903 | 902 |
+---------+----------+
| 905 | 903 |
+---------+----------+
| 902 | 904 |
+---------+----------+
| 904 | 905 |
+---------+----------+
I often scan the tag to tag relationships using an INNER JOIN statement with the “tag” table so that I can query who is related to “nancy”. I’m wondering how much better off I would have been better off just dumping the name of the tag into the tag to tag table rather than joining the tag table so that I can look for relationships based on the tag name. Is JOINING the table a huge performance hit? My tag to tag table is in the 900k row range. The tag table is around 30k.
If you wish to store any meta-information about the tag, then you’ll need a
tagtable anyway. Adding a join does potentially increase the expense of the query a good deal.In your case, I recommend you consider the following:
idtoTagStringTagStringTagtable, with cascading update/deleteIn this way, you can group, filter, etc… on a single column, but if you need more info, you can join over to the
Tagtable (or whatever tables you need).I ran into some serious performance problems with MySQL when we hit 80,000,000 tag records and were doing live joins to generate tag clouds on http://tagcloud.com … Some caching really helped that, but still, it seemed like it was pushing the design limits of a relational database (in the normal form). We would have been better off to use a different storage format that may be more expensive to write to, but faster to read from.