I’m trying to think of a regular expression for this but not having any luck..
Let’s say you have a security question on you website so the person can recover a password. People often forget exactly how they entered information. For example, given the question “What company do you work for?”, a user might answer “Microsoft Corp.”. But a month later when they are prompted to answer this question, they might type in “Microsoft”, which wouldn’t match their original answer even though they clearly answered correctly.
“Microsoft Corp.” or “Microsoft Inc.” or “Microsoft Co.” would match “Microsoft”, and “questar gas” would match “Questar Gas Co.”. “Bank Corp. of America” would NOT match “Bank of America” because the word “Corp.” is not at the end of the string.
What is the best way to accomplish this?
I wouldn’t worry too much about people changing their answers. People are remarkably consistant in how they answer these kinds of questions. If I know your first job was at Microsoft, the fact that I type it slightly differently may suggest that I’m an attacker.
Avoid placing plaintext answers in your database. This is similar to storing plaintext passwords, which is definately a bad idea. If your database, or a backup of the database, gets out of your control, then you have a leak of your client’s private information. Maybe it won’t fall into the wrong hands, but think of the email you’ll have to send to your users. “Change all the sites where you answered this question the same way.”
Instead, take the salted hash of the answer, and store that in the database. When the user answers the question later, use the same algorithm to hash their answer, and compare to the stored value.
You can use some normalization to minimize the effect of typos. You could convert to lowercase, and remove spaces and punctuation. For example, “Microsoft Corp.” would become “microsoftcorp”. That way, if a user decides to leave off the period or add another space, it would still match.