I have the option of writing two different formats for a database structure:
Article ------- ArticleID int FK Article_Tags ------------ ArticleTagID int FK ArticleID int FK TagText varchar(50)
or
Article ------- ArticleID int PK Article_Tags ------------ ArticleTagID int PK ArticleID int FK TagText varchar(50) FK Tag --- TagText varchar(50) PK
If I want a list of all tags in the database, I could use:
select distinct tagtext from article_tags
or:
select tagtext from tag
The first situation is pretty easy. Maybe it would speed up if i indexed it properly. The second is a little harder because I have to constantly delete tags with no joins. Imagine a pretty large system, which would be better.
I would go with
There really is no reason to denormalize this from get go. (your first and second versions are not normalized)
Having tags in a separate table is fine, and getting the unique tags is cleaner without that distinct statement.
Performance wise if any of the 3 options are going to perform pretty much the same provided the correct indexing is applied.