I have recently been tasked with making an account system for all of our products, much like that of a Windows account across Microsoft’s products. However, one of the requirements is that we are able to easily check for accounts with the same information across them so that we can detect traded accounts, sudden and possibly fraudulent changes of information, etc.
When thinking of a way to solve this problem, I thought we could reduce redundancy of our data while we’re at it. I thought it might help us save some storage space and processing time, since in the end we’re going to just be processing the data set into what I explain below.
A bit of background on how this is set up right now:
- An account table just contains an id and a username
- A profile table contains a reference to an account and references to separate pieces of profile data: names, mailing addresses, email addresses
- A name table contains an id and the first, last, and middle name of an individual
- An address table contains data about an address
- An email address table contains an id and the mailbox and domain of an email address
A profile record is what relates the unique pieces of profile data (shared across many accounts) to a specific account. If there are fifty people named “John Smith”, then there is only one “John Smith” record in the names table. If a user changes any piece of their information, the profile record is soft deleted and a new one is created. This is to facilitate change tracking.
After profiling, I have noticed that creating constraints like UNIQUE(FirstName, MiddleName, LastName) is pretty painful in terms of record insertion. Is that simply the price we’re going to have to pay or is there a better approach?
I have concluded my research and decided that this approach is fine if insert performance is not critical. In cases where it is critical, increasing data redundancy within reason is an acceptable trade off.
The solution described in my question is adequate for my performance needs. Storage is considered more expensive than insertion time, in our model.