I have a database where several tables have a column email for storing email address. Because this is for surveys, a lot of values will be identical, more likely identical over names, addresses, etc.
Should I just have a master Emails table, and then an email_id column? That way I’m only storing email strings once, instead of multiple times in tables. But if I wanted to be sure that I was only storing unique emails, isn’t there some limitation to the length that the index will check a string for uniqueness, therefore I could be storing multiple copies of long email addresses?
In the survey database, we store the email address they submit. If they choose to join a mailing list, we store those uniquely (one email per membership) in a mailing list membership table, so there could be multiple of the same addresses in that table, depending on how many clubs they join. Now I’m adding a table to track bounceback emails, because that’s a property of the email address, not the survey or mailing list membership. And I’m thinking, “This is a lot of string joins!”
Is this a form of the “One True Lookup Table”?
It doesn’t actually matter very much.
No. There’s no limitation. Unique means unique, not “unique to some random limit”.
So? String joins aren’t horribly slow. If you can prove that these string joins are the worst bottleneck in your application, then replacing string joins with integer FK’s might speed things up.
Until you can prove that these string joins are your worst problem, don’t worry about them.
Worry about getting the business rules right for how an email address is used. Don’t optimize until you can prove you have a problem.