I’m still learning PHP and SQL. I’m trying to create a simple content management system for a website’s list of events. All of the input form fields are either Text areas or Text boxes (yes, I want them that way), and I want to leave the user the ability to add HTML links in addition to text in these fields. The following functions seem a good place to start with sanitizing the input I get from the user, but since I’m new to this I wanted to get the opinions of more knowledgeable developers. What more should I be doing to try to protect the database?
P.S. Thanks to CSS-Tricks for these functions.
function cleanInput($input) {
$search = array(
'@<script[^>]*?>.*?</script>@si', // Strip out javascript
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments
);
$output = preg_replace($search, '', $input);
return $output;
}
function sanitize($input) {
if (is_array($input)) {
foreach($input as $var=>$val) {
$output[$var] = sanitize($val);
}
}
else {
if (get_magic_quotes_gpc()) {
$input = stripslashes($input);
}
$input = cleanInput($input);
$output = htmlentities($output);
$output = mysql_real_escape_string($input);
}
return $output;
}
Quite easily:
Also,
htmlescapeis almost always the wrong thing to use–it will mangle utf8 input. Also, you should not be storing html-escaped data in your DB. I’m not even sure why you use it here at all–won’t you have to unescape the html to display it?However you are going about this the wrong way.
DOMDocumentorhtml5libor eventidylib. Unfortunately PHP doesn’t seem to have anything as wonderful as Bleach on Python, so you will have to roll your own. An XSLT stylesheet with a whitelist seems like it might be a good way to handle this particular sanitization condition. Update: another user pointed out HTML Purifier, which is also a whitelist-based html sanitizer. I’ve never used it but it looks like “Bleach in PHP”. You should definitely investigate.A general outline of processing is like so:
Input
if (get_magic_quotes_gpc()) die ('TURN OFF MAGIC QUOTES!!!!');PDOlibrary with prepared statements. This way you do not need to remember to escape data by hand.Output
Escape your data inside your template. Individual fields of your data will need to be escaped differently. You almost always need to run it through
htmlspecialcharsbefore output; the only case you would not do that is when the data you need to display is already html (i.e. your whitelist-sanitized html fields). Define a helper function like this and use it in your templates:Even better, try to use a template library that automatically escapes strings for you and that requires you to turn off escaping explicitly. (The common case should be simple to avoid errors, and having to escape is the common case!)