I’m designing a schema for an event registration system which will involve students from schools across many different regions. My main problem is the method by which I store school names in the database.
Given that students will be registering separately, it’s highly likely that spelling variations of the same school name will accumulate over time. I’d like an easy way to purge these, especially as one of the statistics we’d like to gather would be the number of schools and institutions that register for the event.
I’m debating between storing school_name as an extra column in a Participant table, or storing a school_id as a foreign key referencing a School table (can’t think of any other way). Which one would prove more efficient when it comes to utilization of storage, ease of purging duplicate data, and other factors?
If you want to avoid the possibility of mistakes, you could provide a list (yes, it might be large) of existing school names. If they choose one, you store the ID of that school. Because you might not be able to anticipate ALL schools, you could have a free-form text field for “other” schools and store that as text on the participant record. You may have to do periodic reconciliation to add new schools to the school list, or link a participant’s “other” school to an existing school (maybe it was in the list but they just didn’t see it or maybe they misspelt it).