I am a PHP newbie and am working on a basic form validation script. I understand that input filtering and output escaping are both vital for security reasons. My question is whether or not the code I have written below is adequately secure? A few clarifying notes first.
- I understand there is a difference between sanitizing and validating. In the example field below, the field is plain text, so all I need to do is sanitize it.
- $clean[‘myfield’] is the value I would send to a MySQL database. I am using prepared statements for my database interaction.
- $html[‘myfield’] is the value I am sending back to the client so that when s/he submits the form with invalid/incomplete data, the sanitized fields that have data in them will be repopulated so they don’t have to type everything in from scratch.
Here is the (slightly cleaned up) code:
$clean = array();
$html = array();
$_POST['fname'] = filter_var($_POST['fname'], FILTER_SANITIZE_STRING);
$clean['fname'] = $_POST['fname'];
$html['fname'] = htmlentities($clean['fname'], ENT_QUOTES, 'UTF-8');
if ($_POST['fname'] == "") {
$formerrors .= 'Please enter a valid first name.<br/><br/>';
}
else {
$formerrors .= 'Name is valid!<br/><br/>';
}
Thanks for your help!
~Jared
I’d say rather that output escaping is vital for security and correctness reasons, and input filtering is potentially-useful measure for defence-in-depth and to enforce specific application rules.
The input filtering step and the output escaping step are necessarily separate concerns, and cannot be combined into one step, not least because there are many different types of output escaping, and the right one has to be chosen for each output context (eg HTML-escaping in a page, URL-escaping to make a link, SQL-escaping, and so on).
Unfortunately PHP is traditionally very hazy on these issues and so offers a bunch of mixed-message functions that are likely to mislead you.
Yes. Alas,
FILTER_SANITIZE_STRINGis not in any way a sane sanitiser. It completely removes some content (strip_tags, which is itself highly non-sensible) whilst HTML-escaping other content. eg quotes turn into". This is a nonsense.Instead, for input sanitisation, look at:
checking it’s a valid string for the encoding you’re using (hopefully UTF-8; see eg this regex for that);
removing control characters, U+0000–U+001F and U+007F–U+009F. Allow the newline through only on deliberate multi-line text fields;
removing the characters that are not suitable for use in markup;
validating the input conforms to application requirements on a field-by-field basis, for data whose content model is more specific than arbitrary text strings. Although your escaping should handle a
<character correctly, it’s probably a good idea to get rid of it early in fields where it makes no sense to have one.For the output escaping step I’d generally prefer
htmlspecialchars()tohtmlentities(), though your correct use of theUTF-8argument stops the latter function breaking in the way it usually does.