I am in the process of validating a form that I will be using on my website in order to obtain certain details about a person of business that is registering an online account with us.
I am writing this question in order to obtain a bit of advice on how the validation the following types of information correctly.
In obtain to explain this, I will list a series of data types along with the html validation I had in mind. This could then be reused in a series of php validations amoung other things in order to ensure that the form is always validated correctly, however the standard html validation in my opinion looks better than anything I have been able to achieve by applying my own css.
First Names – ^[a-zA-Z -]{1,120} (a-z, from 1 to 120 characters long, big or small letters)
Last Names – ^[a-zA-Z -]{1,120}
Email Addresses – ^([a-zA-Z0-9_\\-\\.]+)@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.)|(([a-zA-Z0-9\\-]+\\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\\]?)$ (validation including .com and .co.za domains which is what is mainly used)
If anyone has any suggestions relating to bettering these validation patterns or perhaps some others that are of more standard in use, that info would be greatly appreciated.
Also any information relating to why they should be or should not be used would be great too.
Thanks!!
Your “validation” of names excludes all languages that don’t use the Latin Alphabet. Why? I guess you could check that there aren’t any numbers in there and leave it at that. If you want people without Latin names to be able to use your site then your (database?) should be in a character set such as UTF-8 and you’ll have to allow everything. Even trying to remove rude words can result in the scunthorpe problem.
Don’t validate e-mails using regular expressions. Mail / ping the address and get the person to click on a link. It is technically impossible to validate an e-mail address using regexes and the better ones that have been developed can be ridiculous. Non-latin domain names exist and as with names you can’t use the Latin alphabet to ensure that they contain what you want.
Also, as ICANN are currently selling off some new gTLDs, that will substantially increase the available name-space you’re never going to be able to guarantee that something actually exists without checking.
Obviously, if you’re using a database, use prepared statements to stop SQL Injection.