Sometimes when a user is copying and pasting data into an input form we get characters like the following:
didn’t,“ for beginning quotes and †for end quote, etc …
I use this routine to sanitize most input on web forms (I wrote it a while ago but am also looking for improvements):
function fnSanitizePost($data) //escapes,strips and trims all members of the post array { if(is_array($data)) { $areturn = array(); foreach($data as $skey=>$svalue) { $areturn[$skey] = fnSanitizePost($svalue); } return $areturn; } else { if(!is_numeric($data)) { //with magic quotes on, the input gets escaped twice, which means that we have to strip those slashes. leaving data in your database with slashes in them, is a bad idea if(get_magic_quotes_gpc()) //gets current configuration setting of magic quotes { $data = stripslahes($data); } $data = pg_escape_string($data); //escapes a string for insertion into the database $data = strip_tags($data); //strips HTML and PHP tags from a string } $data = trim($data); //trims whitespace from beginning and end of a string return $data; } }
I really want to avoid characters like I mention above from ever getting stored in the database, do I need to add some regex replacements in my sanitizing routine?
Thanks,
- Nicholas
I finally came up with a routine for replacing these characters. It took parsing some of the problematic strings one character at a time and returning the octal value of each character. In doing so I learned that smart quote characters come back as sets of 3 octal values. Here is routine I used to parse the string:
Here are the str_replace() calls to ‘fix’ the string:
I am going to continue building up an array of these search/replacements which I am sure will continue to grow with the increasing use of these types of characters.
I know that there are some canned routines for replacing these but I had no luck with any of them on the Solaris 10 platform that my scripts are running on.
— Nicholas